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STRATEGIES FOR GENE EXPRESSION ANALYSIS 

CROSS REFERENCE TO RELATED APPLICATIONS 
[0001] This application claims priority to and benefit of United States Provisional 

Application Number 60/397,393, filed July 19, 2002, the disclosure of which is 

5 incorporated herein in its entirety for all purposes. , 

COPYRIGHT NOTIFICATION 
[0002] Pursuant to 37 C.F.R. 1.71(e), Applicants note that a portion of this 

disclosure contains material which is subject to copyright protection. The copyright owner 

has no objection to the facsimile reproduction by anyone of the patent document or patent 

10 disclosure, as it appears in the Patent and Trademark Office patent file or records, but 

otherwise reserves all copyright rights whatsoever. 

BACKGROUND OF THE INVENTION 
[0003] There are numerous biotechnology applications in which the researcher is 

interested the changes in gene expression of a moderate set of genes, for many hundreds or 

15 thousands of biological samples. Over the last decade, gene expression analysis has proven 
to be an extremely valuable tool for monitoring the state of cells, and specific pathway 
responses to different stimulations and environments. This ability to both broadly survey 
cellular activities and to track differential and dynamic responses means that expression 
tools have been able to provide significant insight into cancer and other disease genetics. 

20 The current state of the art in gene expression is represented by two very different 

technologies, microarray analysis and real-time rtPCR. Each technology offers major 
targeted benefits, with microarrays enabling large-scale surveys of thousands of genes for 
small sets of samples, and real-time rtPCR providing high sensitivity, high accuracy 
measurements of small sets of genes for hundreds to thousands of samples. There is, 

25 however, a technological gap that is not fully served by either of these technologies. 

[0004] Multiple experimental applications exist where there is an interest and a need 

to screen moderate sets of genes, e.g. 20 to 100 genes for hundreds to thousands of samples. 
For example, to fully capture the activities of functional pathways such as apoptosis or 
angiogenesis, it is necessary to track between 50 and 100 genes. In fact, linear and 

30 nonlinear statistical techniques have been successfully applied to the analysis of microarray 
data and it is clear that correlation and cluster analysis generally collapses the responses of 
thousands of genes to a much smaller set of representative genes and response types. For 



example, Thomas et.al. (2001) Molecular Pharmacology 60: 1 189-1 194, have used this 
approach to identify 12 key transcripts out of 1200 that can predictively track 5 major 
toxicological responses. Van't Veer et.aL (2002) Nature 415: 530-536, recently 
demonstrated that a set of 70 genes, out of 25,000 tested, could provide a prognostic 
5 signature for metasteses in breast cancer patients, and that the expression profile 
outperformed other clinical parameters used to predict disease outcome. 
[0005] Another major area of interest for a high throughput gene expression assay is 

compound library screening. The pharmaceutical drug discovery process has traditionally 
been dominated by biochemical and enzymatic studies of a designated pathway. Although 

10 this approach has been productive, it is very laborious and time-consuming, and is generally 
targeted to a single gene or defined pathway. Today, the predominant screening assay 
formats fall into two categories: gene specific and phenotypic. Gene-specific screens, such 
as protein binding assays and reporter gene assays, focus on capturing the effects of a given 
compound on a single gene or protein endpoint, while phenotypic screens typically capture 

15 gross cellular changes, such as apoptosis, cell proliferation, or ion flux. Both of these 
screening approaches have significant value, but they are not optimal for screening 
compounds with respect to their effects on a multiplicity of genes involved in a complex 
disease, such as cancer. Gene-specific screens are too focused and cannot observe 
multigenic responses to perturbations. Cell-based phenotypic screens are too broad and 

20 cannot be used to differentiate the multiple pathways that can be altered to produce a 
phenotypic response, nor can they effectively be used to optimize and direct compound 
development toward specific mechanisms of action. Molecular biology and the development 
of gene cloning have dramatically expanded the number of genes that are potential drug 
targets, and this process is accelerating rapidly as a result of the progress made, e.g., in 

25 sequencing the human genome. In addition to the growing set of available genes, 

techniques such as the synthesis of combinatorial chemical libraries have created daunting 
numbers of candidate drugs for screening. In order to capitalize on these available 
materials, methods are needed that are capable of extremely fast and inexpensive analysis of 
gene expression levels. The utilization of a screen that can look at a multiplicity of genes in 

30 parallel, e.g. 5-100, can be used to overcome the deficits of these other screening 
approaches. 

[0006] Automated high-throughput, rtPCR is one efficient approach to gene 

expression analysis. This approach involves isolating RNA from cells, performing 



multiplexed rtPCR and then running out the samples on a capillary electrophoresis unit. For 
example, in the context of screening a compound or chemical library of 10,000 compounds 
in a cell-based assay, in which the relative expression levels for 20 genes are measured, the 
established process involves several steps including culturing the experimental cells, 
5 typically in microtiter-plate format, isolation of the RNA from these cells, selective 

amplification using rtPCR, in targeted sets of 10 to 20 genes per amplification reaction, and 
analysis of the amplification products using capillary electrophoresis. 
[0007] This process is robust and incorporates an amplification scheme that couples 

the use of gene-specific and universal primers to lock in the relative gene ratios for all of the 

10 genes being amplified. The method also takes advantage of the newest generation of 
automated, high-resolution capillary electrophoresis instruments. However, these 
instruments are capable of analyzing only a moderate set of samples in a given run. 
[0008] Nucleic acid microarrays are available, having the benefit of assaying for 

sample hybridization to a large number of probes in a highly parallel fashion. They can be 

15 used for quantitation of mRNA expression levels, and dramatically surpass the above 
mentioned techniques in terms of multiplexing capability. These arrays comprise short 
DNA probes, such as PCR products, oligonucleotides, or cDNA products fixed onto a solid 
surface, which can then be used in a hybridization reaction with a target sample, generally a 
whole cell extract (see, for example, U.S. Patent Nos. 5,143,854 and 5,807,522; Fodor et al. 

20 (1991) Science 251:767-773; and Schena et al. (1995) Science 270:467-470), cellular RNA 
sample, or cDNA sample corresponding to cellular RNAs. Microarrays can be used to 
measure the expression levels of several thousands of genes simultaneously, generating a 
gene expression profile of the entire genome of relatively simple organisms. Each reaction, 
however, is performed with a single biological sample against a very large number of gene 

25 probes. As a consequence, microarray technology does not facilitate high throughput 

analysis of very large numbers of unique samples against an array of known probes. While 
both microarrays and real-time rtPCR techniques can be pressed into service in these 
important experimental areas, the fact of the matter is that neither method can do this work 
cost efficiently and with limited amounts of sample. As demand for gene expression data 

30 increases, it is desirable to further reduce costs per expression data point while increasing 
throughput. However, the scientific focus for the process should remain the same, namely, 
the accurate analysis of moderate sets of genes (tens to hundreds) for many thousands of 
samples. 



[0009] Described herein are strategies for screening compound libraries involving 

carrying the rtPCR approach to a new level of throughput while reducing cost per data 
point. The approach involves replacing capillary electrophoresis readouts with microarray- 
format readouts. The advantages of the method are multiple and include (1) the ability to 
5 run thousands of samples in high throughput, e.g. in hours of time versus weeks, (2) the 
possibility to work with very small amounts of RNA, e.g. sub-nanogram amounts, opening 
the door to multiplexed gene expression analysis of very small amounts of tissue (such as 
^ can obtained using laser capture microdissection), and (3) the potential to run at a very low 
cost per data point, e.g. 1 or a few pennies per gene. This conversion of readout format can 

10 be directly integrated into the current rtPCR process enabling a smooth transition to this 

higher throughput platform. This change in methodology also modifies the existing platform 
for further advances based on the parallelization of sample processing in the microarray 
format, modifications that can lead to increased economies in reagent usage, time and labor, 
while maintaining a focus on measuring the gene expression response for moderate sets of 

15 genes across numerous biological samples. 

SUMMARY OF THE INVENTION 
[0010] The present invention provides methods for screening compound libraries, 

e.g., to identify compounds with potential therapeutic utility. In the methods of the present 

invention, expression products derived from a plurality of biological samples or sources are 

20 simultaneously detected in a microarray format. Expressed RNA samples are obtained from 
a plurality of biological samples which have been exposed, e.g., contacted or treated with 
members of a compound library, such as a library of chemical compositions. Following 
collection of the expressed RNA samples, by isolating total cellular RNA, or a population of 
RNAs such as messenger RNAs (mRNAs), a population of nucleic acids (or a subset of 

25 RNA species, i.e., polynucleotide sequences) corresponding to each of the samples is 

arrayed to produce a nucleic acid array. Frequently, amplification products corresponding 
to the expressed nucleic acids are arrayed. Alternatively, RNA or cDNA corresponding to 
the expressed nucleic acids can be arrayed. Optionally, the nucleic acids undergo one or 
more purification step prior to arraying. 

30 [0011] A plurality of defined sequence probes, e.g., probes each having a unique 

polynucleotide sequence, such as a set of genes, disease related targets, or the like, each of 
which is capable of giving rise to a different detectable signal is then hybridized 
simultaneously to the nucleic acid array. A defined sequence probe, in the context of the 
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invention, can be, e.g., an oligonucleotide, a cDNA, an amplification product or a restriction 
fragment. In various embodiments, the defined sequence probes are capable of generating 
different signals produced by different fluorescent labels or fluorophores, chromophores, 
electrophores, radioactive nuclides, chemically reactive moieties, amplifiable signal 
5 elements and/or enzymes or ligands. Signals corresponding to hybridization of the defined 
sequence probes to the nucleic acid array are then detected, and, typically quantitated. 
Optionally, the signals are compared between probes or between samples. 
[0012] Amplification of the expressed nucleic acids is typically performed prior to 

arraying the nucleic acids. Commonly, the amplification step involves one or more nucleic 

10 acid amplification, e.g., by a PCR, TMA, NASBA or RCA reaction. Optionally, the PGR is 
an rtPCR that couples reverse transcription and amplification of the expressed RNA 
samples. The amplification can be either a global amplification or a selective (e.g., target 
specific) amplification of one or more species in the expressed RNA sample(s). For 
example, amplification can be performed by multiplex PCR using a plurality of gene 

15 specific primers. Optionally, the multiplex PCR also includes a universal or semi-universal 
primer. In some embodiments, the gene specific primers also include a universal priming 
sequence (universal primer). A multiplex PCR in the context of the invention results in 
amplification of a plurality of nucleic acid species or products, typically between about 5 
and about 100 different polynucleotide sequences, or between about 10 and about 50 

20 polynucleotide sequences. Each expressed RNA sample can be amplified in two or more 
target specific amplification arrays, and, for example, spatially arrayed in two or more 
locations on a physical array. Optionally, a plurality of defined sequence probes each of 
which specifically hybridizes to the products of a different target specific amplification 
reaction is hybridized to the array. In some embodiments, amplification products are pooled 

25 for arraying. 

[0013] Optionally, a post-hybridization amplification step can be performed to 

increase the signal to noise ratio and increase sensitivity of detection of the signal 
corresponding to hybridization of the defined sequence probes and the nucleic acid array. 
Amplification can be facilitated by the inclusion of an amplifiable signal element into the 
30 probe. In some embodiments, the amplifiable signal element is an oligonucleoitde sequence 
that can be amplified, e.g., by branched DNA amplification (BDA), by rolling circle 
amplification (RCA), by using DNA dendrimer probes, or variations of these procedures. 
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Alternatively, the signal can be amplified by an enzymatic or catalytic reaction that gives 
rise to a detectable product. 

[0014] In various embodiments of the invention, expressed RNA samples for 

analysis are obtained from a variety of biological sources or samples which have been 
5 exposed to or treated with members of a library of compositions or agents of potential 

therapeutic value. A biological sample can be either prokaryotic or eukaryotic, and can be 
cells, such as primary cells or a eel] line, e.g., an immortalized cell line. The choice of cell 
lines is typically determined by the nature of the organism or cell which is the target of the 
therapeutic agent sought in the screening endeavor. Alternatively, a biological sample can 

10 be a tissue or organ biopsy, or, in some cases, an organism, or collection of organisms. 
Prior to obtaining the expressed RNA sample form the biological sample, the biological 
sample is treated, contacted or exposed to one or more agent, compound or composition 
prior to sample collection. For example, subpopulations of a cell line can each be treated 
with a different member of a collection of compositions, e.g., a chemical or compound 

15 library. 

[0015] As numerous samples can be analyzed simultaneously, favorable 

embodiments involve obtaining and analyzing expression data from a large number of 
biological samples, e.g., greater than about 100 samples, each of which has been treated 
with (or contacted with or exposed to) a member of a compound library. Usually, each 
20 biological sample is treated with a different member of the compound library. Typically, 
more than 500 samples are arrayed and analyzed. Commonly, in excess of 1000 samples 
are simultaneously arrayed and analyzed. Frequently, in excess of about 2000 samples are 

r 

analyzed, and in certain embodiments, greater than about 10,000 biological samples are 
analyzed. Alternatively, the methods are directed toward simultaneous analysis of 

25 expression data from a small number of samples, e.g., from between 2 and about 20 
samples, or a moderate number of samples, such as between about 20 and about 100 
samples. ^ 
[0016] A variety of nucleic acid array formats can be employed in the context of the 

present invention. In some embodiments, the arrays are solid phase arrays, i.e., the nucleic 

30 acids are arrayed on one or more solid phase surface. In some embodiments, the nucleic 
acids corresponding to expressed RNA samples are arrayed on a two dimensional solid 
phase surface. In alternative embodiments, the nucleic acids are arrayed on a plurality of 
solid phase surfaces, such as beads, spheres, pins, or optical fibers. 



[0017] Solid phase arrays surfaces can include a variety of materials, and in various 

embodiments of the invention, the array surface is composed, e.g., of glass, coated glass, 
silicon, porous silicon, nylon, ceramic or plastic. 

[0018] An aspect of the invention relates to methods for determining relative gene 

5 expression for a plurality of expression products in two or more biological samples, e.g., a 
control sample and one or more biological samples which have been exposed to or 
contacted with a member of a compound library. These methods involve obtaining 
expressed RNA samples from a plurality of different biological samples and arraying sets of 
nucleic acids corresponding to the expressed RNA samples, or a subset of species in the 

10 expressed RNA samples. A plurality of defined sequence probes, each comprising a 

different polynucleotide sequence, and each of which is capable of generating a different 
detectable signal is then hybridized to the array, and a signal corresponding to the 
hybridization between the probes and the array is detected and quantitated. Hybridization 
signals are then compared between biological samples for a plurality of the defined 

15 sequence probes. 

[0019] In the methods for screening a compound library to identify a compound 

with a physiological effect on a biological sample, the biological samples can include 
members of a population of experimental organisms, multiple subpopulations of a primary 
cell isolate or cell line, tissue samples (e.g., sub-samples of a tissue, samples of identical 

20 tissues, or samples of related tissues) or extracts made from tissue(s) or cells. A biological 
sample can be either prokaryotic or eukaryotic. A compound library can be a chemical or 
biochemical (or combined) composition library, such as a compound collection library, a 
combinatorial chemical library, a scaffold-focused chemical library, a target focused 
chemical library, an antibody library, a biological library, a natural product library, an 

25 antisense agent library, an iRNA library, a siRNA library, a ribozyme library, a peptide 
library, and a combinatorial nucleic acid oligomer library. 

[0020] Typically an expressed RNA samples is also obtained from an untreated 

biological sample (or a zero time point sample, or other control sample). Nucleic acids 
corresponding to the expressed RNA samples are arrayed to produce a nucleic acid array, 
30 and a plurality of defined sequence probes each capable of giving rise to a different 

detectable signal is hybridized to the array. Signals corresponding to hybridization between 
the probes and the array are quantitated and differences in expression between treated and 
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control hybridization signals are evaluated to identify compounds that exert a physiological 
effect on the biological sample, e.g., by exerting an effect on one or more biological targets. 
[0021] Quantitated hybridization signals can differ either qualitatively or 

quantitatively from one or more control hybridization signals (e.g., an internal control 
5 hybridization signal), and can be either increased or decreased relative to a control 

hybridization signal. For example, one or more defined sequence probe corresponding to 
genes of interest as well as a control probe, such as a probe corresponding to a 
housekeeping gene, are hybridized to the array. The resulting hybridization signals are 
detected, quantitated and the relative expression between the gene(s) of interest and the 

10 control are determined. In the analysis of multiple duplicate arrays, consistency can be 
maintained by differing the gene specific probes between arrays while hybridizing the 
multiple arrays to the same control, e.g., housekeeping, gene. In some embodiments, 
differences between the hybridization signals are evaluated by performing at least one 
statistical analysis. For example, a quantitative difference can be at least one standard 

15 deviation, or two standard deviations from a reference or control hybridization signal. 
[0022] The methods of the invention optionally involve recording data 

representative of one or more of the hybridization signals (e.g., indicative of an absolute or 
relative quantitation of a hybridization signal for the plurality of samples) in a database. 
Commonly, the database is in a computer or computer readable medium. 

20 [0023] The invention also provides hybridization systems including an array of 

nucleic acids corresponding to a plurality of expressed RNA samples each of which is 
obtained from a different biological sample which have been contacted with members of a 
compound library, and a plurality of defined sequence probes each capable of generating a 
different detectable signal. The nucleic acid array can include any one or more of RNA, 

25 cDNA, or amplification products corresponding to expressed RNAs from biological 

samples. The plurality of defined sequence probes can be any set of probes having different 
polynucleotide sequences. In certain favorable embodiments, the probes include a set of 
genes, such as genes that are disease related targets. 

[0024] The invention also includes integrated systems including the hybridization 

30 systems of the invention and components or modules for performing the methods of the 
invention, as well as kits incorporating components for the systems and methods of the 
invention. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
[0025] Figure 1 schematically illustrates arraying of nucleic acids corresponding to 

expressed RNAs derived from multiple biological samples, and hybridizing with a plurality 

of differently labeled probes. 

5 [0026] Figure 2 schematically illustrates duplication of an array to increase probe 

diversity. 

[0027] Figure 3 schematically illustrates library screening using capillary 

electrophoresis technology. 

[0028] Figure 4 schematically illustrates library screening by hybridizing a plurality 

10 of differently labeled probes to a nucleic acid microarray. 

[0029] Figure 5 schematically illustrates a global amplification approach for nucleic 

acid arrays. 

[0030] Figure 6 schematically illustrates a protocol involving on chip signal 

amplification. 

15 [0031] Figure 7 schematically illustrates a procedure for isolating RNA on the array 

coupled with signal amplification. 

[0032] Figures 8A and B illustrate a selective amplification protocol and 

amplification of products in a multiplex amplification reaction, respectively. 
[0033] Figure 9 graphically displays expression profiles for a plurality of genes, of 

20 cells treated with a chemical compound (emitine). 

[0034] Figure 10 graphically displays the linearity and dynamic range of a 

amplification reaction relative to (3-actin. 

[0035] Figure 1 1 illustrates data collected from an exemplary microarray 

experiment. Intensity of fluoresecence indicates quantitative hybridization of a labeled 
25 probe with increasing concentrations of multiplexed PCR amplification product including 

the target. 

/ 

DETAILED DESCRIPTION 
[0036] The present invention involves screening compound libraries for drug 

discovery by "flipping" the standard microarray paradigm. Microarray formats typically 

30 involve the spatial organization of numerous probe sequences on a solid phase surface, and 

application of a single labeled nucleic acid sample to the microarray. A signal 

corresponding to the hybridization between the labeled test sample and the probe array is 

then detected, most commonly using automated array detection devices. This technology 



permits the analysis of gene expression for numerous query sequences across a single 
biological sample. Multiple duplicate arrays are tested with multiple samples, or the same 
array is contacted sequentially with multiple nucleic acid samples to analyze multiple 
biological samples. In the context of drug discovery efforts, this permits a broad survey of a 
5 compound's effects on a biological system from which the RNA sample is derived. 

However, this approach is prohibitively expensive the purpose of evaluating the effects of 
numerous compounds. 

[0037] In contrast, the present invention provides methods for analyzing gene 

expression in which nucleic acids corresponding to RNA samples derived from a number of 
10 biological samples, which have been exposed to (or contacted or treated with) members of a 
compound library are assembled into an array, and multiple gene specific probes are 
hybridized to these sample arrays. In other words, the samples are placed on the surface 
and the probes are in solution. 

[0038] Standard microarrays differentiate between the genes being monitored by 

15 assigning a unique spatial placement to each of the gene specific probes on the microarray 
surface. The methods described herein for "flipping" the microarray, distinguish between 
different gene specific probes by differential labeling of the individual probes (e.g., by 
labeling different probes with fluorescent labels that can be uniquely identified by their 
absorption/emission properties). While this approach limits the number of probe sequences 

20 (e.g., genes) that can be analyzed in any single array reaction, it facilitates the use of the 
spatial arraying dimensions for the high level of multiplexing of samples (e.g., samples 
treated with members of a large compound library) in a single experiment. Automated, or 
semi -automated duplication procedures are employed to increase the number of sequences 
analyzed as desired, according to the number of compounds to be screened. 

25 [0039] Gene expression profiles of biological samples exposed to members of a 

compound library are generated, allowing the practitioner to determine, in a gene specific 
manner, the effects of the individual members of the library on a physiological system or 
biological sample of interest. However, several other applications are also possible, as 
would be apparent to one skilled in the art from a reading of this disclosure. For example, 

30 the methods of the present invention can be used to investigate the profile and expression 
levels of one or more members of complex gene families, e.g., in resonse to treatment with 
compositions under evaluation as potential therapeutic agents, with respect to both 
therapeutic and toxicologic properties. As an illustration, cytochrome P-450 isozymes form 
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a complex set of related enzymes that are involved in detoxification of foreign substances in 
the liver (Ortiz de Montellano (1995) Cytochrome P450 Structure Mechanism and 
Biochemistry, Plenum Press, New York). The various isozymes in this family have been 
shown to be specific for different substrates. Design of target-specific probes that hybridize 
5 to variant regions in the genes provides an assay by which their relative levels of induction 
in response to drug treatments can be monitored. Other examples include monitoring 
expression levels of alleles with allele-specific probes, or monitoring mRNA processing 
with probes that specifically hybridize to a spliced or unspliced region, or to splice variants. 
One skilled in the art could envision other applications of the present invention that would 
10 provide a method to monitor genetic variations or expression mechanisms, e.g., relevant to 
responses to drug efficacy or toxicity. 

DEFINITIONS 

[0040] Before describing the present invention in detail, it is to be understood that 

this invention is not limited to particular devices or biological systems, which can, of 
15 course, vary. It is also to be understood that the terminology used herein is for the purpose 
of describing particular embodiments only, and is not intended to be limiting. As used in 
this specification and the appended claims, the singular forms "a", "an" and "the" include 
plural referents unless the content clearly dictates otherwise. 

[0041] Unless defined otherwise, all technical and scientific terms used herein have 

20 the same meaning as commonly understood by one of ordinary skill in the art to which the 
invention pertains. Although any methods and materials similar or equivalent to those 
described herein can be used in the practice or testing of the present invention, the currently 
preferred materials and methods are described herein. In describing and claiming the 
present invention, the following terminology will be used in accordance with the definitions 
25 set out below. 

[0042] "Expression products" are ribonucleic acid (RNA) or polypepetide products 

transcribed or translated, respectively, from a genome or other genetic element. Commonly, 
expression products are associated with genes having biological properties. Thus, the term 
"gene" refers to a nucleic acid sequence associated with a biological properties, e.g., 
30 encoding a gene product with physiologic properties. A gene optionally includes sequence 
information required for expression of the gene (e.g., promoters, enhancers, etc.). 
[0043] The term "gene expression" refers to transcription of a gene into an RNA 

product, and optionally to translation into one or more polypeptide sequences. The term 
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"transcription" refers to the process of copying a DNA sequence of a gene into an RNA 
product, generally conducted by a DNA-directed RNA polymerase using DNA as a 
template. 

[0044] The term "nucleic acid" refers to a polymer of ribonucleic acids or 

5 deoxyribonucleic acids, including RNA, mRNA, rRNA, tRNA, small nuclear RNAs, 
cDNA, DNA, PNA, RNA/DNA copolymers, or analogues thereof. Nucleic acid may be 
obtained from a cellular extract, genomic or extragenomic DNA, viral RNA or DNA, or 
artificially/chemically synthesized molecules. 

[0045] The term "RNA" refers to a polymer of ribonucleic acids, including RNA, 

10 mRNA, rRNA, tRNA, and small nuclear RNAs, as well as to RNAs that comprise 

ribonucleotide analogues to natural ribonucleic acid residues, such as 2-O-methylated 
residues. 

[0046] The term "cDNA" refers to complementary or "copy" DNA. Generally 

cDNA is synthesized by a DNA polymerase using any type of RNA molecule (e.g., 
15 typically mRNA) as a template. Alternatively, the cDNA can be obtained by directed 
chemical syntheses. 

[0047] The term "amplified product" or "amplified nucleic acid" refers to a nucleic 

acid generated by any method of nucleic acid amplification. 

[0048] The term "complementary" refers to nucleic acid sequences capable of base- 

20 pairing according to the standard Watson-Crick complementary rules, or being capable of 
hybridizing to a particular nucleic acid segment under relatively stringent conditions. 
Nucleic acid polymers are optionally complementary across only portions of their entire 
sequences. 

[0049] The term "hybridization" refers to duplex formation between two or more 

25 polynucleotides, e.g., to form a double-stranded nucleic acid. The ability of two regions of 

complementarity to hybridize and remain together depends of the length and continuity of 

the complementary regions, and the stringency of hybridization conditions. 

[0050] A "defined sequence probe" is a nucleic acid probe having a single 

polynucleotide sequence. 
30 [0051] The term "synthetic probe" is used to indicate that the probe is produced by 

one or more synthetic or artificial manipulations, e.g., restriction digestion, amplification, 

oligonucleotide synthesis, cDNA synthesis, and the like/ 
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[0052] The term "label" refers to any detectable moiety. A label may be used to 

distinguish a particular nucleic acid from others that are unlabeled, or labeled differently, or 
the label may be used to enhance detection. 

[0053] The term "primer" refers to any nucleic acid that is capable of hybridizing at 

5 its 3' end to a complementary nucleic acid molecule, and that provides a free 3' hydroxyl 
terminus which can be extended by a nucleic acid polymerase. 

[0054] The term "template" refers to any nucleic acid polymer that can serve as a 

sequence that can be copied into a complementary sequence by the action of, for example, a 
polymerase enzyme. 

10 [0055] The term "target," "target sequence," or "target gene sequence" refers to a 

specific nucleic acid sequence, the presence, absence or abundance of which is to be 
determined. In a preferred embodiment of the invention, it is a unique sequence within the 
mRNA of an expressed gene. 

[0056] The term "target-specific primer" refers to a primer capable of hybridizing 

15 with its corresponding target sequence. Under appropriate conditions, the hybridized primer 
can prime the replication of the target sequence. 

[0057] The term "semi -universal primer" refers to a primer that is capable of 

hybridizing with more than one, but not all, of the target-specific primers in a multiplexed 
reaction. 

20 [0058] The term "universal primer" refers to a replication primer comprising a 

universal sequence. 

[0059] The term "universal sequence" refers to a sequence contained in a plurality 

of primers, but preferably not in a complement to the original template nucleic acid (e.g., 
the target sequence), such that a primer composed entirely of universal sequence is not 

25 capable of hybridizing with the template. 

[0060] The term "reference sequence" refers to a nucleic acid sequence serving as a 

target of amplification in a sample that provides a control for the assay. The reference may 
be internal (or endogenous) to the sample source, or it may be an externally added (or 
exogenous) to the sample. An external reference may be either RNA, added to the sample 

30 prior to reverse transcription, or DNA (e.g., cDNA), added prior to PCR amplification. 
[0061] The term "multiplex reaction" refers to a plurality of reactions conducted 

simultaneously in a single reaction mixture, and includes, for example, multiplex 
amplification and multiplex hybridization reactions. 
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[0062] The term "multiplex amplification" refers to a plurality of amplification 

reactions conducted simultaneously in a single reaction mixture. 

[0063] In the context of the present invention, the term "simultaneously" means that 

the reaction, e.g., a hybridization reaction, occurs at substantially the same time. For 
5 example, reagents to be hybridized, such as multiple defined sequence probes are contacted 
at the same time and/or in the same solution with target nucleic acids, e.g., an array of 
nucleic acids. 

[0064] In the context of the present invention, an "amplifiable signal element" is a 

component of a probe that facilitates amplification of a signal following hybridization of the 

10 probe to a target sequence. 

[0065] The term "gene expression data" refers to one or more sets of data that 

contain information regarding different aspects of gene expression. The data set optionally 
includes information regarding: the presence of target-transcripts in cell or cell-derived 
samples; the relative and absolute abundance levels of target transcripts; the ability of 

15 various treatments to induce expression of specific genes; and the ability of various 
treatments to change expression of specific genes to different levels. 
[0066] The term "quantitating" means to assign a numerical value, e.g., to a 

hybridization signal. Typically, quantitating involves measuring the intensity of a signal 
and assigning a corresponding value on a linear or exponential numerical scale. 

20 [0067] The term "relative abundance" or "relative gene expression levels" refers to 

the abundance of a given species relative to that of a second species. Optionally, the second 
species is a reference sequence. 

[0068] The term "treatment" refers to the process of subjecting (i.e., treating) one or 

more cells, cell lines, tissues, or organisms to a condition, substance, or agent (or 

25 combinations thereof) that may cause the cell, cell line, tissue or organism to alter its gene 
expression profile. A treatment may include a range of chemical concentrations and 
exposure times, and replicate samples may be generated. The term "chemical treatment" 
refers to the process of exposing (or contacting) a cell, cell line, tissue or organism to (or 
with) a chemical or biochemical compound (or library of compounds) that has/have the 

30 potential to alter its gene expression profile. 

[0069] The term "platform" refers to' the instrumentation method used for sample 

preparation, amplification, product separation, product detection, or analysis of data 
obtained from samples. 
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[0070] The terms "microplate," "culture plate," and "multiwel! plate" 

interchangeably refer to a surface having multiple chambers, receptacles or containers and 
generally used to perform a large number of discreet reactions simultaneously. 
[0071] The term "high throughput format" refers to analyzing more than about 10 

5 samples per hour, preferably about 50 or more samples per hour, more preferably about 100 
or more samples per hour, most preferably about 250, about 500, about 1000 or more 
samples per hour. 

[0072] The term "miniaturized format" refers to procedures or methods conducted at 

submicroliter volumes, including on both microfluidic and nanofluidic platforms. 
10 OVERVIEW 

[0073] A schematic outline of an exemplary method of the invention is illustrated in 

Figure 1 . Multiple RNA samples obtained from biological samples which have been treated 
with members of a library of compositions of interest in a screening effort aimed at 
identifying potential therapeutic agents. Usually such libraries are large collections of 

15 compounds or compositions, ranging from hundreds to many thousands of different 

compositions, e.g., from about 500 to many thousands of compounds. Typically, RNA 
samples arranged (or arrayed) in microtiter plates provide the templates for generating a 
series of nucleic acid (NA) products that are then arrayed in one or more microarray, for 
example, in the format of microarray slides. The nucleic acid products in the form of 

20 amplification products are commonly produced produced by rtPCR. For example, in a 
favorable embodiment the rtPCR performs a multiplexed targeted (e.g., target or gene 
specific) amplification reaction. Alternatively, RNA or cDNA products are arrayed. 
Typical microarray slides contain between a thousand and 20,000 nucleic acid "spots." 
Each nucleic acid sample is assigned a unique location on the microarray. Therefore, as 

25 many as 20,000 different nucleic acid, e.g., amplification product, samples (corresponding 
to expressed RNAs from as many as 20,000 unique biological samples, e.g., samples treated 
with 20,000 individual members of a composition library) can be arrayed and analyzed on a 
single microarray slide. In the example shown in Figure 1, 4 different genes are analyzed 
using 4 different defined sequence oligonucleotide probes. The different probes are labeled 

30 with 4 different labels that can be uniquely detected and quantitated in the array reader. 

[0074] The ability to analyze 4 different genes for 20,000 samples on a given slide 

may seem limited in terms of gene depth. However, it is trivial to replicate a given slide ^ 
using existing slide printing instruments to generate upwards of 100 or more slides per set 
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of samples. This replication process is shown schematically in Figure 2. The use of 
replicate microarrays makes it possible to analyze numerous different query sequences 
against the same RNA samples. The processes of printing, probing and scanning the 
microarray plates is a near parallel process, therefore, it takes nearly the same time and 
5 resources to analyze 20 (or 100) plates as it does a single plate. 

[0075] For comparison, an established process utilizing capillary electrophoresis is 

shown schematically in Figure 3. The capillary electrophoresis process is contrasted with 
the methods of the present invention in the context of screening a compound or chemical 
library of 10,000 compounds in a cell-based assay, in which the relative expression levels 

10 for 20 genes are measured providing 200,000 data points. The established capillary 

electrophoresis process involves several steps, including culturing of the experimental cells, 
typically in microti template format (i.e., 10,000 compounds in 100 plates), isolation of the 
RNA from these cells, selective amplification using rtPCR, in targeted sets of 10 to 20 
genes per amplification reaction, and analysis of the amplification products using capillary 

15 electrophoresis. 

[0076] This process is robust and incorporates an amplification scheme that couples 

the use of gene-specific and universal primers to lock in the relative gene ratios for all of the 
genes being amplified. The method also takes advantage of the newest generation of 
automated, high-resolution capillary electrophoresis instruments. But while these 
20 instruments are state of the art, they still only run a moderate set of samples, e.g., 2x16 
samples for 20 genes, in a given run, necessitating approximately 300 runs and 30 minutes 
each. Thus, capillary analysis using current capabilities, e.g., on 1 ABB 100 analyzer, takes 
more than one week. 

[0077] An exemplary method for screening a compound library according to the 

25 present invention is shown schematically in Figure 4. The process resembles the existing 
capillary electrophoresis process in that it involves RNA isolation, and uses an rtPCR -based 
amplification scheme, in which amplification is performed in, e.g., 384 well plates. The 
process differs from the current methodology following amplification of the RNA sample, 
or alternative production of a nucleic acid sample corresponding to the RNA sample. 
30 Instead of using a capillary electrophoresis instrument to detect and quantitate the amplified 
products, the process involves spotting all of the amplified products onto microarray slides. 
Depending on the number of genes to be analyzed, the amplified products are deposited 
onto one or more slides. For example, if one wishes to analyze 20 genes coming from a 
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single rtPCR reaction, one needs to deposit or "print" the amplified products down onto 7 
microarray slides, wherein each array is used to analyze three genes plus a control or 
reference gene. 

[0078] These modifications of existing procedures lead to a dramatic increase in 

5 throughput. For example, 10,000 samples can be run through the post PCR process in a 
single 24-hour period versus the one to two weeks necessary to run all of the samples on a 
. single capillary electrophoresis instrument. In scenarios where the number of genes to be 
analyzed increases, this differential grows even larger. For example, the analysis of 100 
genes would increase the time to completion of the analysis in the capillary format 5-fold to 

10 5-10 weeks, while the time associated with running 35 microarrays remains a matter of a 
couple of days. The cost savings are also significant with the reagent costs associated with 
running microarrays being conservatively estimated to be less that half that of capillaries. 
Additionally, the present invention reduces the overall number of steps involved in 
performing multi-gene expression analysis on numerous biological samples. 

15 [0079] The substitution of microarrays also offers several additional benefits. As 

illustrated in Figure 5, because the transition to microarrays eliminates the need to size 
individual PCR products, a universal or global mRNA amplification scheme (e.g., as 
described by Kurn or Eberwine, infra, or by Rolling Circle Amplification) can be utilized. 
The advantage of using a global amplification scheme is most apparent in cases where one 

20 wishes to regularly analyze more than 20-30 genes, the practical limit for PCR, from a 
single sample. 

[0080] Figure 6, illustrates an exemplary strategy in which post hybridization signal 

amplification is performed to increase sensitivity of analysis, e.g., with genes expressed at 
low levels. In an alternative embodiment post hybridization signal amplification replaces 
25 sample amplification, dramatically reducing reagent and labor costs associated with running 
10,000 individual amplification reactions, e.g., PCR. In these embodiments, arraying, 
probing and signal amplification can be performed in less than 24 hours for 10,000 or more 
compounds. 

[0081] One advantage of a signal amplification scheme is that amplification is 

30 performed late in the process after compression of the sample set from, e.g., 25 384-well 
microtiter-plates to 7 microarray slides. This compression in sample format converts the 
amplification from 10,000 individual reactions to just 7, reducing sample-to-sample 
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variability in the data, since the treatment conditions between samples are more nearly 
identical. ' 
[0082] Alternatively or additionally, the RNA isolation process can be modified to 

reduce processing. The utilization of a microarray format makes it possible to create a 
5 miniaturized and highly simplified approach to mRNA capture and isolation, as shown in 
Figure 7. Glass slides used to create microarrays are routinely coated with different 
compounds and chemical functionalities to alter the binding and adherence properties of the 
slide. Through the use of existing chemistries it is possible to coat glass slides with 
polythymidine (polyT). Crude cell lysates (or some fraction thereof, containing the mRNA) 

10 can be directly spotted onto the polyT- coated slides. The mRNA is annealed to the polyT, 
and the unbound material is washed away. Thus, the entire set of steps for processing, 
handling and detecting the RNA occurs on the microarray slide. This simplification of the 
process represents a dramatic reduction in sample handling steps and reagent usage and 
creates a gene expression analysis platform that is capable of very high throughputs and can 

15 be run at an extremely low cost per data point. 

SCREENING LIBRARIES OF COMPOSITIONS 

[0083] The present invention provides methods for identifying compounds, e.g., 

< 

chemicals, that have a physiological effect on one or more physiological processes in a 
biological system, such as a cell (e.g., a cell line in culture), tissue or organism. In one 

20 favorable embodiment, a chemical or compound library is screened according to the 
methods of the invention. One favorable application of the present invention is in the 
screening of large compound libraries for the purpose of identifying agents with potential 
therapeutic application, e.g., activity relevant to a physiologic, metabolic or genetic pathway 
related to preventing or treating a disease state or condition. Alternative embodiments 

25 include screening compound libraries for compounds for purposes other than identifying 
therapeutic agents, e.g., agents with effects on a biological system unrelated to a disease 
state. Typically, biological samples, such as samples of a cell line in culture, are exposed 
to, or treated, e.g., contacted, with a member of a chemical or compound library. Following 
exposure, an expressed RNA sample is recovered from each treated sample, and analyzed as 

30 described herein. Typically, a large number of expressed RNA samples derived from 
biological samples, for example, a large number of samples each corresponding to a 
population of the same cell line, each of which has been treated with a different member of 
the compound library, are spatially arrayed, e.g., on a glass microarray slide and hybridized 
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to a plurality of probes of interest, e.g., corresponding to genes encoding components of a 
biochemical pathway of interest. Usually, anywhere from about 100 (or 200, or 500) to 
several thousand, e.g., about 10,000, about 20,000 different expressed RNA samples 
corresponding to samples (i.e., populations) of a cell line, each of which is exposed to one 
5 (or more) members of a library of compositions, is arrayed and analyzed according to the 
methods of the invention. 

[0084] For example, a cell or cell line can be treated with or exposed to one or more 

characterized or uncharacterized chemical libraries (chemical compound libraries), chemical 
or biochemical constituents, e.g., pharmaceuticals, pollutants, DNA damaging agents, 

10 oxidative stress-inducing agents, pH-altering agents, membrane-disrupting agents, 

metabolic blocking agent; a chemical inhibitors, cell surface receptor ligands, antibodies, 
transcription, promoters/enhancers/inhibitors, translation promoters/enhancers/inhibitors, 
protein-stabilizing or destabilizing agents, various toxins, carcinogens or teratogens, 
proteins, lipids, or nucleic acids. The libraries include combinatorial chemical libraries, 

15 scaffold-focused chemical libraries, target focused chemical libraries, biological libraries, 
natural product libraries, antisense agent libraries, iRNA libraries, siRNA libraries, 
ribozyme libraries, peptide libraries and combinatorial nucleic acid oligomer libraries, etc. 
As will be appreciated by one skilled in the art, the number of classes of compounds and/or 
compound analogues that can be screened for a physiological effect on a biological sample 

20 is extensive, and includes, but is not limited to, the following groups of compounds: ACE 
inhibitors; anti-inflammatory agents; anti-asthmatic agents; antidiabetic agents; anti- 
infectives (including but not limited to antibacterials, antibiotics, antifungals, 
antihelminthics, antimalarials and antiviral agents); analgesics and analgesic combinations; 
apoptosis inducers or inhibitors; local and systemic anesthetics; cardiac and/or 

25 cardiovascular preparations (including angina and hypertension medications, anticoagulants, 
anti-arrhythmic agents, cardiotonics, cardiac depressants, calcium channel blockers and beta 
blockers, vasodilators, and vasoconstrictors); chemotherapies, including various 
antineoplastics; immunoreactive compounds, such as immunizing agents, 
immunomodulators, immunosuppressives; appetite suppressants, allergy medications, 

30 arthritis medications, antioxidants, herbal preparations and active component isolates; 
neurologically-active agents including Alzheimers and Parkinsons disease medications, 
migraine medications, adrenergic receptor agonists and antagonists, cholinergic receptor 
agonists and antagonists, anti-anxiety preparations, anxiolytics, anticonvulsants, 
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antidepressants, anti-epileptics, antipsychotics, antispasmodics, psychostimulants, hypnotics, 
sedatives and tranquilizers, and the like. 

[0085] In some applications, selection of the compounds used for treatment of the 

biological samples is made based on literature and knowledge of experts in the field of 
5 interest. In order to take full advantage of the comparative analysis approach to discerning 
mechanism of response for a drug or composition and identifying new compositions, it is 
useful to analyze a selection of compositions including, but not limited to, a range of 
therapeutics (either approved or currently in clinical trials), therapeutic candidates, research 
chemicals, libraries of synthetic compositions, natural or biological compounds, herbal 

10 compositions, and other chemicals that potentially interact with one or more target 
molecules or that appear to drive cells to a comparable phenotype(s). 
[0086] A number of tools and techniques can be used to treat cells in the context of 

the present invention. These techniques include, but are not limited to, transient treatments 
with chemicals that broadly stimulate activity and/or generally perturb the environment 

15 within the cell. By "stimulation" is meant a perturbation in the equilibrium state of the 
biochemical and/or genetic pathways of the cell, and is not meant to be limited to an 
increase in concentration or biological activity. Examples of stimulatory agents, chemicals 
and treatments include, but are not limited to, oxidative stress, pH stress, pH altering agents, 
DNA damaging agents, membrane disrupters, metabolic blocking agents, and energy 

20 blockers. Additionally, cellular perturbation may be achieved by treatment with chemical 
inhibitors, cell surface receptor ligands, antibodies, oligonucleotides, ribozymes and/or 
vectors employing inducible, gene-specific knock in and knock down technologies. The 
identity and use of stimulatory agents, chemicals and treatments are known to one of skill in 
the art. 

25 [0087] Examples of DNA damaging agents include, but are not limited to, 

intercalation agents such as ethidium bromide; alkylating agents such as methyl 
methanesulfonate; hydrogen peroxide; UV irradiation, and gamma irradiation. Examples of 
oxidative stress agents include, but are not limited to, hydrogen peroxide, superoxide 
radicals, hydroxyl free radicals, perhydroxyl radicals, peroxyl radicals, alkoxyl radicals, and 

30 the like. Examples of membrane disrupters include, but are not limited to, application of 
electric voltage potentials, Triton X-100, sodium dodecyl sulfate (SDS), and various 
detergents. Examples of metabolic, blocking and/or energy blocking agents include, but are 
not limited to, azidothymidine (AZT), ion (e.g. Ca ++ , K + , Na + ) channel blockers, a and P 
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adrenoreceptor blockers, histamine blockers, and the like. Examples of chemical inhibitors 
include, but are not limited to, receptor antagonists and inhibitory metabolites/catabolites 
(for example, mavelonate, which is a product of and in turn inhibits HMG-CoA reductase 
activity). 

5 [0088] Examples of cell surface receptor ligands include, but are not limited to, 

various hormones (estrogen, testosterone, other steroids), growth factors, and G-protein- 
coupled receptor ligands. Examples of antibodies include, but are not limited to, antibodies 
directed against TNFoc, TRAIL, or the HER2 growth factor receptor. 
[0089] Examples of oligonucleotides that can be used to treat samples in present 

10 invention include, but are not limited to, ribozymes, anti-sense oligonucleotides, iRNA, 
siRNA, etc. For example, ribozymes are RNA molecules that have an enzymatic or 
catalytic activity against sequence-specific RNA molecules (see, for example, Intracellular 
Ribozyme Applications: Principles and Protocols , J. Rossi and L. Couture, eds. (1999, 
Horizon Scientific Press, Norfolk, UK)). Ribozymes can be generated against any number 

15 of RNA sequences, as shown in the literature for a number of target mRNAs including 
calretinin, TNFoc, HIV-1 integrase, and the human interleukins. 

[0090] In one embodiment of the present invention, treating biological samples 

involves administering varying concentrations of the plurality of compounds to a plurality 
of biological samples (e.g., subpopulations of a cell line grown in culture), thereby 

20 generating a dose-response. The responses can be measured at either a single timepoint or 
over a plurality of timepoints. Optionally, at least one measurement is collected prior to 
treatment with the member composition. Commonly, this "zero time point" sample serves 
as a reference or control. Alternatively, or additionally, a separate but comparable 
biological sample (e.g., a subpopulation of the same cell line used for the treated samples) is 

25 left untreated or unexposed to any exogenous compound for purposes of a reference or 
control. 

Biological samples 

[0091] Expressed RNA samples for use in the screening methods of the present 

invention are obtained from a number of biological sources. Biological samples can either 
30 prokaryotic or eukaryotic in origin. For example, expressed RNA samples can be obtained 
from such biological sources as animals, plants, yeast, fungi, bacteria and viruses which 
have been treated with one or more members of a compound library. Biological samples in 
the context of the present invention include vertebrates, such as mammals, e.g., mice, rats, 
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hamsters, guinea pigs, rabbits, cats, dogs, primates, humans, and non-mammalian 
vertebrates, such as amphibians, e.g., frogs, toads, and fish, such as zebra fish, and other 
species of scientific interest, as well as non-vertebrate species such as nematodes and 
insects, e.g., Drosophila. 
5 [0092] Most frequently the biological source or sample is a cell line grown in 

culture, i.e., an immortalized strain of a cell obtained from a multicellular organism. Cell 
lines useful in the methods of the invention includes cell lines derived from, for example, 
one or more different types of tissues or tumors, primary cell lines, cells which have been 
subjected to transient and/or stable genetic modification, and the like. Optionally, the cells 

10 are mammalian cells, for example murine, rodent, guinea pig, rabbit, canine, feline, primate 
or human cells. Alternatively, the cells can be of non-mammalian origin, derived, for 
example, from frogs, amphibians, or various fishes such as the zebra fish. 
[0093] Cell lines which can be used in the methods of the present invention include, 

but are not limited to, those available from cell repositories such as the American Type 

15 Culture Collection (www.atcc.org) , the World Data Center on Microorganisms 

(http://wdcm.nig.ac.jp y European Collection of Animal Cell Culture (www.ecacc.org ) and 
the Japanese Cancer Research Resources Bank (http://cellbank.nihs.go.jp ). These cell lines 
include, but are not limited to, the following cell lines: 293, 293Tet-Off, CHO-AA8 Tet- 
Off, MCF7, MCF7 Tet-Off, LNCap, T-5, BSC-1, BHK-21, Phinx-A, 3T3, HeLa, PC3, 

20 DU145, ZR 75-1, HS 578-T, DBT, Bos, CV1, L-2, RK13, HTTA, HepG2, BHK-Jurkat, 

Daudi, RAMOS, KG-1, K562, U937, HSB-2, HL-60, MDAHB231, C2C12, HTB-26, HTB- 
129, HPIC5, A-431, CRL-1573, 3T3L1, Cama-1, J774A.1, HeLa 229, PT-67, Cos7, OST7, 
HeLa-S, THP-1, and NXA. Additional cell lines can be obtained, for example, from cell 
line providers such as Clonetics Corporation (Walkersville, MD; www.clonetics.com) , 

25 Optionally, the expressed RNA samples are derived from cultured cells optimized for the 
analysis of a particular disease area of interest, e.g., cancer, inflammation, cardiovascular 
disease, infectious diseases, proliferative diseases, an immune system disorder (e.g., 
multiple sclerosis, diabetes, allergy), or a central nervous system disorder (e.g., alzheimer's 
disease, parkinson disease). 

30 [0094] A variety of cell culture media for maintaining cells of interest in culture are 

described in The Handbook of Microbiological Media , Atlas and Parks (eds) (1993, CRC 
Press, Boca Raton, FL). References describing the techniques involved in bacterial and 
animal cell culture include Sambrook et al., Molecular Cloning - A Laboratory Manual (2nd 
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Ed.), Vol. 1-3 (1989, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York); 
Current Protocols in Molecular Biology , F. M. Ausubel et al., eds., Current Protocols, (John 
Wiley & Sons, Inc., supplemented through 2002); Freshney, Culture of Animal Cells, a 
Manual of Basic Technique , third edition (1994, Wiley-Liss, New York) and the references 
5 cited therein; Humason, Animal Tissue Techniques , fourth edition (1979, W.H. Freeman 
and Company, New York); and Ricciardelli, et ah (1989) In Vitro Cell Dev. Biol. 
25:1016-1024. Information regarding plant cell culture can be found in Plant Cell and 
Tissue Culture in Liquid Systems , by Payne et al. (1992, John Wiley & Sons, Inc. New 
York, NY); Plant Cell, Tissue and Organ Culture: Fundamental Methods by Gamborg and 

10 Phillips, eds. (1995, Springer Lab Manual, Springer- Verlag, Berlin ), and is also available in 
commercial literature such as the Life Science Research Cell Culture Catalogue (1998) from 
Sigma- Aldrich, Inc (St Louis, MO) (Sigma-LSRCCC) and the Plant Culture Catalogue and 
supplement (1997) also from Sigma- Aldrich, Inc (St Louis, MO) (Sigma-PCCS). 
[0095] For example, either primary or immortalized (or other) cell lines are grown 

15 in a master flask, then trypsinized (if they are adherent) and transferred to a 96- well plate, 
seeding each well at a density of 10 4 to 10 6 cells/well. If the gene expression profile in 
response to a chemical treatment is sought, the chemical agent of choice is prepared in a 
range of concentrations (further details regarding treatment with, e.g., compound or 
chemical libraries, is provided hereinbelow). After a time of recovery and growth as 

20 appropriate to the cell line, cells are exposed to the chemical for a period of time that will 
not adversely impact the viability of the cells. Preferably, assays include a range of 
chemical concentrations and exposure times, and include replicate samples. After 
treatment, typically, the medium is removed and expressed RNA samples are prepared form 
the cells. Alternatively, other multiwell plate formats can be employed, such as 6, 12, 48, 

25 384, 1536 wells, etc. Culture formats that do not use conventional flasks (e.g., roller 
bottles, plates, etc.), as well as microtiter formats, can also be used. 
[0096] The choice of cell lines employed in the methods of the present invention 

will vary based upon a number of factors, such as the desired activity, the disease area of 
interest, and the number of relevant cell lines available. Additional considerations, e.g., for 

30 screening compound libraries for potential drug targets, include, but are not limited to, the 
representation of diverse cell types (for example, the use of diverse cancer cell types for 
screening of cancer inhibitory compounds), previous usage in the study of similar 
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compounds, and sensitivity or resistance to drug treatment. Optionally, the methods are 
performed in a high throughput, multiwell format. 

[0097] In some circumstances, cell lines with one or more modifications in a 

biochemical or genetic pathway are employed. The difference between a modified 
5 (daughter) cell line and a parental (e.g. wild type) cell line can arise, for example, from 
changes in the "functional activity" of at least one biological molecule, for example, a 
protein or a nucleic acid. A difference in the functional activity of a biological molecule 
refers to an alteration in an activity and/or a concentration of that molecule, and can include, 
but is not limited to, changes in transcriptional activity, translational activity, catalytic 

10 activity, binding or hybridization activity, stability, abundance, transportation, 

compartmentalization, secretion, or a combination thereof. The functional activity of a 
biological molecule can also be affected by changes in one or more chemical modifications 
of that molecule, including but not limited to adenylation, glycosylation, phosphorylation, 
acetylation, methylation, ubiquitination, and the like. 

15 [0098] The alteration in activity or concentration of the at least one biological 

molecule can also result from treatment of the parental cell line. Furthermore, the alteration 
can be a temporary response to treatment, e.g., stimulation inhibition, or it can be a 
permanent change (e.g., a mutation or an irreversible structural modification). Temporary 
alterations can be produced by treatment with a variety of chemical stimulatory and 

20 inhibitory molecules, as well as by cell surface receptor ligands, antibodies, 

oligonucleotides, ribozymes and/or vectors employing inducible, gene-specific knock in and 
knock down technologies. Alternatively, cells can be treated with DNA damaging agents 
such as, intercalating agents such as ethidium bromide; alkylating agents such as 
ethylnitrosourea and methyl methanesulfonate; hydrogen peroxide; UV irradiation, and 

25 gamma irradiation. Examples of oxidative stress agents include, but are not limited to, 
hydrogen peroxide, superoxide radicals, hydroxyl free radicals, perhydroxyl radicals, 
peroxyl radicals, alkoxyl radicals, and the like. Examples of metabolic blocking and/or 
energy blocking agents include, but are not limited to, azidothymidine (AZT), ion (e.g. 
Ca ++ , K + , Na + ) channel blockers, a and p adrenoreceptor blockers, histamine blockers, and 

30 the like. Examples of chemical inhibitors include, but are not limited to, receptor 

antagonists and inhibitory metabolites/catabolites (for example, mavelonate, which is a 
product of and in turn inhibits HMG-CoA reductase activity). 
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[0099] In some cases, it is optionally desirable to subject the cell line (or other 

biological sample) to one or more environmental stimuli that affect gene expression prior to 
treating with a compound library. For example, a cell line can optionally be exposed to an 
environmental condition, or change in an environmental condition that results in activation 
5 or suppression or one or more genetic or biochemical pathways. Exemplary environmental 
stimuli include changes in temperature, changes in pH, changes in oxygen tension, changes 
in carbon dioxide tension, changes in gas composition, changes in atmospheric pressure or 
exposure to light, e.g., visible, ultraviolet, or infrared radiation. Alternatively, 
environmental stimuli include agents which either directly or indirectly influence gene 

10 expression, including, e.g., solvents. 

[0100] In some cases, expression of one or more genes in the biological sample 

(e.g., cells, tissue or organism) is artificially altered prior to treating the sample with 
members of a compound library. Typically, such an alteration is induced to enhance the 
utility of the biological sample as a model system in which to test for physiological effects 

15 induced by members of a compound library. 

[0101] For example, procedures which alter the genome of the biological sample in 

a permanent manner, such as insertional mutagenesis, deletion of genomic DNA, targeted 
gene disruption, introduction of a genomic or episomal vector, and the like can be used to 
alter expression of one or more genes in a biological sample in a manner which increases its 

20 utility as a model for compound library screening. Similarly, procedures which alter 

expression by interacting with DNA or RNA, such as transcription blocking, antisense DNA 
or RNA, iRNA, ribozymes, DNA binding oligonucleotides and zinc finger proteins can be 
used to impact the expression of one or more genes in the biological sample prior to treating 
the sample with a member of a compound library. 

25 [0102] Pemanent genetic alteration can be produced by a variety of well known 

mutagenesis procedures, e.g., to generate mutant or variant cell lines suitable for library 
screening. A variety of mutagenesis protocols, such as viral-based mutational techniques, 
homologous recombination techniques, gene trap strategies, inaccurate replication 
strategies, and chemical mutagenesis, are available and described in the art. These 

30 procedures can be used separately and/or in combination to produce modified cell lines for 
use in the methods of the present invention. See, for example, Amsterdam et al. "A large- 
scale insertional mutagenesis screen in zebrafish" Genes Dev 1999 Oct 13:2713-2724; 
Carter (1986) "Site-directed mutagenesis" Biochem. J. 237:1-7; Crameri and Stemmer 
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(1995) "Combinatorial multiple cassette mutagenesis creates all the permutations of mutant 
and wildtype cassettes" BioTechniques 18:194-195; Inamdar "Functional genomics the old- 
fashioned way: chemical mutagenesis in mice" Bioessays 2001 Feb 23:116-120; Ling et al. 
(1997) "Approaches to DNA mutagenesis: an overview" Anal Biochem. 254(2): 157-178; 
5 Napolitano et al. "All three SOS-inducible DNA polymerases (Pol II, Pol IV and Pol V) are 
involved in induced mutagenesis" EMBO J 2000 Nov 19:6259-6265; and Rathkolb et al. 
"Large-scale N-ethyl-N-nitrosourea mutagenesis of mice— from phenotypes to genes" Exp 
Physiol 2000 Nov 85:635-44. Furthermore, kits for mutagenesis and related techniques 
are also available from a number of commercial sources (see, for example, Stratagene 

10 (http://www.stratagene.com/vectors/index2.htm), Clontech 

(http://www.clontech.eom/retroviral/index:shtml), and the Gateway cloning system from 
Invitrogen (http://www.invitrogen.com) . General texts which describe molecular biological 
techniques useful in the generation of modified cell lines, including mutagenesis, include 
Berger and Kimmel; Sambrook et al., and Ausubel et al., all supra. Further details 

15 regarding the generation of modified cell lines can be found in, e.g., WO 02/08466 by 
Monforte, and WO 01/71023. 

[0103] Alternatively, procedures for making targeted gene mutations can be 

employed to modify cell lines prior to treating with members of a compound library. For 
example, a gene can be prevented from expressing any protein (knockout) via a number of 

20 processes, including deletion of the gene or transcription promoting elements for the gene at 
the DNA level within the cell. Knockout modifications generally involve modification of 
the gene or genes within the genome (see, for example, Gonzalez (2001) "The use of gene 
knockout mice to unravel the mechanisms of toxicity and chemical carcinogenesis" Toxicol 
Lett 120:199-208). Knockouts can be either heterozygous (e.g. inactivating only one copy 

25 of the gene) or homozygous (inactivating both copies of the gene). One exemplary database 
of mouse knockouts can be found at http://research.bmn.com (the BioMedNet mouse 
knockout and mutation database). 

[0104] Following, or in conjunction with mutagenesis procedure?, cell lines with 

desired modifications are typically selected using one or more experimental techniques to 
30 identify and isolate cells which have been altered in the desired manner. For example, the 
selection process can include, but is not limited to: identifying cells that survive and/or 
continue to grow under different environments, stresses and/or stimulation; cells that have 
increased or decreased expression of a particular protein that can be used to sort or separate 

26 



cells with the altered protein levels, (e.g. using flow cytometry to sort cells that are over 
expressing a particular cell surface receptor); and cells that have an altered physical 
phenotype that can be identified and selected, e.g. cells arrested in a particular cycle phase, 
cells that have altered ability to invade a barrier or translocate, cells that have a different 
5 shape, or have or have not differentiated into a different cell type). Numerous additional 
selection methods are known to one of skill in the art and can be employed to provide cell 
lines for use in the methods of the present invention. 

Isolation of expressed RNA samples 
[0105] Expressed RNA samples are isolated from biological samples using any of a 

10 number of well-known procedures. For example, biological samples are lysed in a 

guanidi ni um-based lysis buffer, optionally containing additional components to stabilize the 
RNA. In some embodiments of the present invention, the lysis buffer also contains purified 
RNAs as controls to monitor recovery and stability of RNA from cell cultures. Examples of 
such purified RNA templates include the Kanamycin Positive Control RNA from Promega 

15 (Madison, WI), and 7.5 kb Poly(A)-Tailed RNA from Life Technologies (Rockville, MD). 
Ly sates may be used immediately or stored frozen at, e.g., -80°C. 
[0106] Optionally, total RNA is purified from cell lysates (or other types of 

samples) using silica-based isolation in an automation-compatible, 96-well format, such as 
the Rneasy® purification platform (Qiagen, Inc.; Valencia, CA). Alternatively, RNA is 

20 isolated using solid-phase oligo-dT capture using oligo-dT bound to microbeads or cellulose 
columns. This method has the added advantage of isolating mRNA from genomic DNA 
and total RNA, and allowing transfer of the mRNA-capture medium directly into the 
reverse transcriptase reaction. Other RNA isolation methods are contemplated, such as 
extraction with silica-coated beads or guanidinium. Further methods for RNA isolation and 

25 preparation can be devised by one skilled in the art. 

[0107] Alternatively, the methods of the present invention are performed using 

crude cell lysates, eliminating the need to isolate RNA. RNAse inhibitors are optionally 
added to the crude samples. When using crude cellular lysates, it should be noted that 
genomic DNA can contribute one or more copies of a target sequence, e.g., a gene, 

30 depending on the sample. In situations in which the target sequence is derived from one or 
more highly expressed genes, the signal arising from genomic DNA may not be significant. 
But for genes expressed at very low levels, the background can be eliminated by treating the 
samples with DNAse, or by using primers that target splice junctions for subsequent 
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priming of cDNA or amplification products. For example, one of the two target-specific 
primers could be designed to span a splice junction, thus excluding DNA as a template. As 
another example, the two target-specific primers are designed to flank a splice junction, 
generating larger PCR products for DNA or unspliced mRNA templates as compared to 
5 processed mRNA templates. One skilled in the art could design a variety of specialized 
priming applications that would facilitate use of crude extracts as samples for the purposes 
of this invention. 

Nucleic acids corresponding to expressed RNA samples 
[0108] In the methods of the present invention, nucleic acids corresponding to 

10 expressed RNA samples are logically or spatially arrayed, as described in further detail 

below. Although expressed RNA samples can be arrayed directly, e.g., on the surface of a 
glass microarray slide, it is generally desirable to employ DNA products corresponding to 
the expressed RNA sample to improve stability and ease of handling. In some instances, 
cDNA products reverse transcribed from the expressed RNA samples according to well 

15 established procedures, e.g., as described in Sambrook, Ausubel, etc. are arrayed. More 
typically, DNA products corresponding to expressed RNA samples are amplified prior to 
arraying.to improve the sensitivity and dynamic range of the assay. 
[0109] Expressed RNA samples can be reverse transcribed using non-specific 

primers, such as an anchored oligo-dT primer, or random sequence primers. An advantage 

20 of this approach is that the mRNA sample maintains an "unfractionated" quality because the 
sites of priming are non-specific, i.e., the products of this RT reaction will serve as template 
for any desired target in the subsequent PCR amplification. One benefit of this approach is 
that samples to be archived are stored in the form of DNA, which is more resistant to 
degradation than RNA. In certain methods (e.g., described by Chenchik in USPN 

25 5,962,271, and commercial available kits supplied by Clontech, Palo Alto, CA), reverse 
transcription of a full length mRNA is initiated using an oligo-dT primer. A cap switching 
oligonucleotide primer is annealed to the 5' cap of the mRNA which serves as a template 
for the nascent strand as it approaches the end of mRNA template. The cap switching 
oligonucleotide primer includes in addition to the sequence that permits it to bind to the cap, 

30 a polynucleotide sequence that serves as a primer annealing site in subsequent amplification 
reactions. 

[0110] Alternatively, RNA is converted to cDNA using a target-specific primer 

complementary to the RNA for each gene target for which expression data is desired. 
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Methods for reverse transcription also include, the use of thermostable DNA polymerases, 
as described in the art. As an exemplary embodiment, avian myeloblastosis virus reverse 
transcriptase (AMV-RT), or Maloney murine leukemia virus reverse transcriptase 
(MoMLV-RT) is used, although other enzymes are contemplated. An advantage of using 
5 target-specific primers in the RT reaction is that only the desired sequences are arrayed, or 
optionally, used, in subsequent amplification reactions. 

[0111] Amplification of DNA products corresponding to expressed RNA samples 

can be accomplished using the polymerase chain reaction (PCR), which is described in 
detail in U.S. Patent Nos. 4,683,195 (Mullis et al), 4,683,202 (Mullis), and 4,800,159 

10 (Mullis et al.), and in PCR Protocols A Guide to Methods and Applications (Innis et al., 
eds.) Academic Press Inc. San Diego, CA (1990), see also, Sambrook, Ausubel. PCR 
utilizes pairs of primers having sequences complimentary to opposite strands of target 
nucleic acids, and positioned such that the primers are converging. The primers are 
incubated with template DNA under conditions that permit selective hybridization. Primers 

15 can be provided in double-stranded or single-stranded form, although the single-stranded 
form is preferred. If the target gene(s) sequence is present in a sample, the primers will 
hybridize to form a nucleic-aeidiprimer complex. An excess of deoxynucleoside 
triphosphates is added, along with a thermostable DNA polymerase, e.g. Taq polymerase. 
If the target gene(s):primer complex has been formed, the polymerase will extend the 

20 primer along the target gene(s) sequence by adding nucleotides. After polymerization, the 
newly-synthesized strand of DNA is dissociated from its complimentary template strand by 
raising the temperature of the reaction mixture. When the temperature is subsequently 
lowered, new primers will bind to each of these two strands of DNA, and the process is 
repeated. Multiple cycles of raising and lowering the temperature are conducted, with a 

25 round of replication in each cycle, until a sufficient amount of amplification product is 
produced. 

[0112] In one favorable variation of the Polymerase Chain Reaction, nucleic acids 

are amplified in a reaction that couples reverse transcription and PCR, "rtPCR." rtPCR 
techniques use either gene specific primers to selectively amplify particular gene sequences, 
30 or the use of random or semi random primers for the amplification of the global population 
of mRNAs or some subset based on the presence of particular sequences or sequence motifs 
{see, e.g., USPN 5,962,271). In all forms of operation, the technique provides for the ability 
to multiplex to very high levels. 
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[0113] Alternative methods for amplifying nucleic acids corresponding to expressed 

RNA samples include, e.g., transcription-based amplification systems (TAS), such as that 
first described by Kwoh et al. (Proc. Natl. Acad. Sci. (1989) 86(4): 1173-7), or isothermal 
transcription-based systems such as 3SR (Self-Sustained Sequence Replication; Guatelli et 
5 al. (1990) Proc. Natl. Acad. Sci. 87:1874-1878) or NASBA (nucleic acid sequence based 
amplification; Kievits et al. (1991) J Virol Methods. 35(3):273-86). In these methods, one 
or more mRNA target of interest is copied into cDNA by a reverse transcriptase. The 
primer(s) for cDNA synthesis includes the promoter sequence of a designated DNA- 
dependent RNA polymerase 5 f to the primer's region of homology with the template. In 

10 some procedures a second complementary cDNA strand is synthesized using, e.g., a hairpin 
loop structure formed by the initially synthesized cDNA strand (see, e.g., Van Gelder et al. 
USPN 5,545,522). Alternatively, a second strand is synthesized from a primer 
complementary to a primer sequence added by template switching to an oligonucleotide that 
anneals to the 5' cap structure of a full-length mRNA (SMART™ Amplification described 

15 in Chenchik et al. USPN 5,962,271). The resulting cDNA products can then serve as 
templates for multiple rounds of transcription by the appropriate RNA polymerase. 
Transcription of the cDNA template rapidly amplifies the signal from the original target 
mRNA. The isothermal reactions bypass the need for denaturing cDNA strands from their 
RNA templates by including RNAse H to degrade RNA hybridized to DNA. Other 

20 methods using isothermal amplification, including, e.g., methods described in USPN 
6,25 1 ,639, are also favorably employed in the context of the present invention. 
[0114] Alternatively, amplification is accomplished by used of the ligase chain 

reaction (LCR), disclosed in European Patent Application No. 320,308 (Backman and 
Wang), or by the ligase detection reaction (LDR), disclosed in U.S. Patent No. 4,883,750 

25 (Whiteley et al.). In LCR, two probe pairs are prepared, which are complimentary each 
other, and to adjacent sequences on both strands of the target. Each pair will bind to 
opposite strands of the target such that they are adjacent. Each of the two probe pairs can 
then be linked to form a single unit, using a thermostable ligase. By temperature cycling, as 
in PCR, bound ligated units dissociate from the target, then both molecules can serve as 

30 "target sequences" for ligation of excess probe pairs, providing for an exponential 
amplification. The LDR is very similar to LCR. In this variation, oligonucleotides 
complimentary to only one strand of the target are used, resulting in a linear amplification 
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of ligation products, since only the original target DNA can serve as a hybridization 
template. It is used following a PCR amplification of the target in order to increase signal. 
[0115] Additional suitable methods include, but are not limited to, strand 

displacement amplification (Walker et al. (1992) Nucleic Acids Res. 20:1691-1696), repair 
5 chain reaction (REF), cyclic probe reaction (REF)i solid-phase amplification, including 
bridge amplification (Mehta and Singh (1999) BioTechniques 26(6): 1082-1086), rolling 
circle amplification (Kool, U.S. Patent No. 5,714,320), rapid amplification of cDNA ends 
(Frohman (1988) Proc. Natl. Acad Sci. 85: 8998-9002), the "invader assay" (Griffin et al. 
(1999) Proc. Natl. Acad. Sci. 96: 6301-6306), and methods for simultaneous amplification 

10 and detection as described in, e.g., USPN 5,914,230 and 6,365,346. 

[0116] Amplification of expressed RNA samples can be performed using random or 

semi -random primers to globally amplify a diverse population of expression products, or 
can be performed using target specific primers to amplify one or more selected expression 
products. Selective amplification of expression products using target specific primers can 

15 be performed in reactions that amplify a single product or that amplify a plurality of 

products, i.e., multiplex amplification reactions. When one or a small number of expression 
products is amplified in a single reaction, the products of multiple reactions can be 
combined or pooled for arraying, if desired. Similarly, a single expressed RNA sample (i.e., 
from a single biological sample) can be amplified in multiple target specific reactions which 

20 are then arrayed in more than one locations of an array. Both of these variations increase 
the number of probes which can be analyzed in a single physical array. 

Multiplex amplification Strategies 
[0117] An embodiment of the methods of the present invention involves the use of 

various PCR multiplexing strategies that are made possible by the combined use of target- 

25 specific (e.g., gene specific) and universal primers. These procedures are variations on the 

RT-PCR assays involving the reverse transcription of a single or double stranded DNA 

template corresponding to one or more expressed RNA species, followed by amplification 

in a PCR. Additional details regarding multiplex PCR strategies are found in, e.g., WO 

01/55454 by Loehrlein et al; and, USPN 5,962,271 to Chenchik et al. 

30 [0118] Multiplex amplification of a plurality target sequences typically involves 

combining the plurality of target sequences with a plurality of target-specific primers (i.e., 

primers complementary to at least one strand of a reverse transcribed cDNA target 

sequence) and one or more universal primers, to produce a plurality of amplification 
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products. A multiplex set of target sequences optionally comprises between about two 
targets and about 100 targets. In one embodiment of the present invention, the multiplex 
reaction includes at least 5 target sequences, but preferably at least ten targets or at least 
fifteen targets. Multiplexes of much larger numbers (e.g., about 20, about 50, about 75 and 
5 greater) are also contemplated. 

[0119] In one embodiment of the methods of the present invention, at least one of 

the amplification targets in the multiplex set is a transcript that is endogenous to the sample 
and has been independently shown to exhibit a fairly constant expression level (for 
example, a "housekeeping" gene, (5-actin). The signal from this endogenous reference 

10 sequence provides a control for converting signals of other gene targets into relative 

expression levels. Optionally, a plurality of control mRNA targets/reference sequences that 
have relatively constant expression levels may be included in the multiplexed amplification 
to serve as controls for each other. Alternatively, a defined quantity of an exogenous 
purified RNA species is added to the multiplex reaction or to the cells, for example, with the 

15 lysis reagents. Almost any purified, intact RNA species can be used, e.g. the Kanamycin 
Positive Control RNA or the 7.5 kb Poly(A)-Tailed RNA mentioned previously. This 
exogenously-added amplification target provides a way to monitor the recovery and stability 
of RNA from cell cultures. It can also serve as an exogenous reference signal for 
converting the signals obtained from the sample mRNAs into relative expression levels. In 

20 still another embodiment, a defined quantity of a purified DNA species is added to the PCR 
to provide an exogenous reference target for converting the signals obtained from sample 
mRNA targets into relative expression levels. 

[0120] In one embodiment of the present invention, once the targets that comprise a 

multiplex set are determined, primer pairs complementary to each target sequence are 

25 designed, including both target-specific and universal primers. This can be accomplished 
using any of several software products that design primer sequences, such as OLIGO 
(Molecular Biology Insights, Inc., CO), Gene Runner (Hastings Software Inc M NY), or 
Primer3 (The Whitehead Institute, MA), Target specific primers include at least two 
portions. The first portion includes a region complementary to a selected "universal 

30 sequence." The universal sequence is utilized to allow amplification of multiple targets 
(having divergent sequences) while using the same primer (e.g., the UP). The universal 
sequence is contained only in the primers, and preferably is not present in any nucleic acid 
(or complement thereof) provided by the sample being tested. A second portion of the 



TSPs, within the 3' region of the sequence, is complementary to and will hybridize with one 
of a plurality of designated target sequences. Although a single universal primer is 
described in the example provided above, multiple universal primers having different or 
unique sequences or labels can be employed in the methods of the present invention. If a 
5 single UP is used, the universal sequence will be the same within all TSPs. If a UP pair is 
to be used, the universal sequence will be different in the forward and reverse primers of the 
TSPs. The UP may also contain a detectable label on at least one of the primers, such as a 
fluorescent chromaphore. Both the target-specific and universal sequences are of sufficient 
length and sequence complexity to form stable and specific duplexes, allowing 

10 amplification and detection of the target gene. In early rounds of the amplification, 

replication is primed primarily by the TSPs. The first round will add the universal sequence 
to the 5' regions of the amplification products. The second cycle will generate sequence 
complementary to the universal sequence within the 3' region of the complementary strand, 
creating a template that can be amplified by the universal primers alone. Optionally, the 

15 reaction is designed to contain limiting amounts of each of the TSPs and a molar excess of 
the UP, such that the UP will generally prime replication once its complementary sequence 
has been established in the template. The molar excess of UP over a TSP can range from 
about 5:1 to about 100:1; optionally, the reaction utilizes approximately 10:1 molar excess 
of UP over the amount of each TSP.' Because all of the TSPs contain the same universal 

20 sequence, the same universal primer will amplify all targets in the multiplex, eliminating the 
quantitative variation that results from amplification from different primers. 
[0121] The templates are initially single-stranded mRNA molecules, but eventually 

are predominantly DNA amplification products that serve as template in subsequent cycles. 
Messenger RNA is converted to cDNA by the action of reverse transcriptase polymerization 

25 from the target-specific reverse primers, or from a random or degenerate primer that results 
in global reverse transcription of the constituents of an expressed RNA sample. If a single 
stranded cDNA template has been synthesized, the target-specific forward primers and the 
universal forward and reverse primers are added along with a thermostable polymerase to 
generate the second strand of cDNA, followed by PCR amplification. The UP can anneal to 

30 target DNA only after its complementary universal sequence is added to the opposite strand 
through replication across the 5' region of the TSP. 

[0122] The length of complementary sequence between each primer and its binding 

partner (i.e. the target sequence or the universal sequence) should be sufficient to allow 
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hybridization of the primer only to its target within a complex sample at the annealing 
temperature used for the PCR. A complementary sequence of, for example, about 15, 16, 
17, 18, 19, 20, 21, 22, 23, 24, or 25 or more nucleotides is preferred for both the target- 
specific and universal regions of the primers. A particularly preferred length of each 
5 complementary region is about 20 bases, which will promote formation of stable and 
specific hybrids between the primer and target. 

[0123] Optionally, primers are designed such that the annealing temperature of the 

universal sequence is higher/greater than that of the target-specific sequences. Method 
employing these primers further include increasing the annealing temperature of the 

10 reaction after the first few rounds of amplification. This increase in reaction temperature 
suppresses further amplification of sample nucleic acids by the TSPs, and drives 
amplification by the UP. Depending on the application envisioned, one skilled in the art can 
employ varying conditions of hybridization to achieve varying degrees of selectivity of 
primer towards the target sequence. For example, varying the stringency of hybridization or 

15 the position of primer hybridization can reveal divergence within gene families. 

[0124] Optionally, each candidate primer is shown or proven to be compatible with 

the other primers used in a multiplex reaction. In a preferred embodiment, each target- 
specific primer pair produces a single amplification product of a predicted size from a 
sample minimally containing all of the targets of the multiplex, and more preferably from a 

20 crude RNA mixture. Preferably, amplification of each individual target by its 

corresponding primers is not inhibited by inclusion of any other primers in the multiplex. 
None of the primers, either individually or in combination, should produce spurious 
products. These issues are easily addressed by one of skill in the art without the need for 
excessive experimentation. 

25 [0125] Oligonucleotide primers are typically prepared by the phosphoramidite 

approach. In this automated, solid-phase procedure, each nucleotide is individually added to 
the 5'-end of the growing oligonucleotide chain, which is in turn attached at the 3'-end to a 
solid support. The added nucleotides are in the form of trivalent 3'-phosphoramidites that 
are protected from polymerization by a dimethoxytrityl ("DMT") group at the 5 '-position. 

30 After base induced phosphoramidite coupling, mild oxidation to give a pentavalent 

phosphotriester intermediate and DMT removal provides a new site for oligonucleotide 
elongation. These syntheses may be performed on, for example, a Perkin Elmer/ Applied 
Biosystems Division DNA synthesizer. The oligonucleotide primers are then cleaved off the 
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solid support, and the phosphodiester and exqcyclic amino groups are deprotected with 
ammonium hydroxide. 

Elimination of Variations in Primer Annealing Efficiency 
[0126] Variations in primer length and sequence can have a large impact on the 

5 efficiency with which primers anneal to their target and prime replication. In a typical 

multiplexed reaction in which each product is amplified by a unique primer pair, the relative 

quantities of amplified products may be significantly altered from the relative quantities of 

targets due to difference in annealing efficiencies. Embodiments of the methods of the 

present invention that couple the use of target-specific primers and universal primers 

10 eliminates this bias, producing amplification products that accurately reflect relative mRNA 

levels. 

Attenuation of Strong Signals 
[0127] The set of targets included in a multiplex reaction generally all yield signal 

strengths within the dynamic range of the detection platform used in order for quantitation 

15 of gene expression to be accurate. In some embodiments, it may be desirable or necessary 
to include a very highly expressed gene in a multiplex assay. However, the highly- 
expressed gene can interfere with quantitation for other genes expressed at very low levels 
if its signal is not attenuated. The methods of the current invention provide ways for 
attenuating the signals of relatively abundant targets during the amplification reaction such 

20 that they can be included in a multiplexed set without impacting the accuracy of 
quantitation of that set. ; 
[0128] Amplification primers are optionally used that block polymerase extension of 

the 3' end of the primer. One preferred embodiment is modification of the 3'-hydroxyl of 
the oligonucleotide primer by addition of a phosphate group. Another preferred 

25 embodiment is attachment of the terminal nucleotide via a 3 '-3' linkage. One skilled in the 
art can conceive of other chemical structures or modifications that can be used for this 
purpose. The modified and the corresponding unmodified primer for the highly abundant 
target are mixed in a ratio empirically determined to reduce that target's signal, such that it 
falls within the dynamic range of other targets of the multiplex. Preferably, the reverse 

30 target-specific primer is modified, thereby attenuating signal by reduction of the amount of 
template created in the reverse transcriptase reaction. 

[0129] Another embodiment for signal attenuation entails use of a target-specific 

primer that contains the target-specific sequence, but no universal primer sequence. This 
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abbreviated primer (lacking the universal sequence) and the corresponding primer 
containing the universal sequence within the 5' region are mixed in a ratio empirically 
determined to reduce that target's signal, such that it then falls within the dynamic range of 
other targets of the multiplex system. 

5 Purification of rtPCR products 

[0130] It is often desirable to "purify" the population of nucleic acids corresponding 

to expressed RNA samples (e.g., rtPCR products), prior to deposit on an array, due to 

presence of contaminants and salts. Numerous approaches to purifying nucleic acids, such 

as PCR products, exist with the two principle high throughput approaches being filtration 

10 in microtiter-plate format and magnetic bead capture and washing. For example, the 

Millipore Montage PCR96 DNA purification plates (and comparable 384-well version of 

this plate) are favorably employed in the context of the present invention. The protocol for 

use involves a simple one-step vacuum filtration and elution of the PCR products, and is 

compatible with automated systems, such as the Biomek Multimek system. Alternatively, 

15 magnetic bead capture and washing approaches can be adapted for an automated platform. 

-% 

Array format 

[0131] Nucleic acids corresponding to expressed RNA samples, whether 

RNA, cDNA or amplification products are then spatially or logically arrayed. Numerous 
technological platforms for performing high throughput expression analysis using nucleic 

20 acid arrays are available. Common array formats include both liquid and solid phase arrays. 
For example, assays employing liquid phase arrays, e.g., for hybridization of nucleic acids, 
can be performed in multiwell, or microtiter, plates. Microtiter plates with 96, 384 or 1536 
wells are widely available, and even higher numbers of wells, e.g, 3456 and 9600 can be 
used. In general, the choice of microtiter plates is determined by the methods and 

25 equipment, e.g., robotic handling and loading systems, used for sample preparation and 
analysis. Exemplary systems include, e.g., the ORCA™ system from Beckman-Coulter, 
Inc. (Fullerton, CA) and the Zymate systems from Zymark Corporation (Hopkinton, MA). 
[0132] Alternatively, a variety of solid phase arrays can favorably be employed to 

determine expression patterns in the context of the present invention. Exemplary formats 

30 include membrane or filter arrays (e.g., nitrocellulose, nylon), pin arrays, and bead arrays 
(e.g., in a liquid "slurry"). Typically, nucleic acids corresponding to expressed RNA 
samples are immobilized, for example by direct or indirect cross-linking, to the solid 
. support. Essentially any solid support capable of withstanding the reagents and conditions 
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necessary for performing the particular expression assay can be utilized. For example, 
functionalized glass, silicon, silicon dioxide, modified silicon, any of a variety of polymers, 
such as (poly)tetrafluoroethylene, (poly)vinylidenedifluoride, polystyrene, polycarbonate, or 
combinations thereof can all serve as the substrate for a solid phase array. Coated forms of 
5 these materials, glass (e.g. polyamine, polyacrylamide, polythymidine or other 

functionalization leading to improved non-covalent or covalent binding. The substrate can 
be a single contiguous surface, e.g. a plate or multiple discrete surfaces, e.g. etched plates, 
filters, or optical fiber ends. Alternatively, the array can be composed of a series of beads 
that can be discretely identified via a number of either color coding schemes (e.g. Luminex) 
10 and flow cytometry or means to physically trap the beads on a surface (e.g. Illumina or 

Lynx). Techniques for the creation and use of these arrays are known to those skilled in the 
art. 

[0133] In a preferred embodiment, the array is a "chip" or "slide" composed, e.g., of 

one of the above specified materials, such as a glass microarray slide. Most commonly, 

15 nucleic acid samples corresponding to expressed RNA samples are deposited, e.g., 

"spotted" onto the chip or slide to produce a spatial array in which each distinct nucleic acid 
population corresponding to a different expressed RNA sample (e.g., derived from a 
different biological sample) is assigned a unique location on the microarray surface. 
Application of nucleic samples to the substrate can be performed using automated devices, 

20 or manually, for example, using a multipin, e.g., 32 pin, tool, with an alignment device (e.g., 
Xenopore, that can deposit up to 768 6 nl spots onto a glass slide). Detailed discussion of 
methods for linking nucleic acids to a substrate, are found in, e.g., US Patent No. 5,837,832 
"Arrays of Nucleic Acid Probes on Biological Chips" to Chee et al., issued November 17, 
1998; US Patent No. 6,087,1 12 "Arrays with Modified Oligonucleotide and Polynucleotide 

25 Compositions" to Dale, issued July 11, 2000; US Patent No. 5,215,882 "Method of 

Immobilizing Nucleic Acid on a Solid Substrate for Use in Nucleic Acid Hybridization 
Assays" to Bahl et al., issued June 1, 1993; US Patent No. 5,707,807 "Molecular Indexing 
for Expressed Gene Analysis" to Kato, issued January 13, 1998; US Patent No. 5,807,522 
"Methods for Fabricating Microarrays of Biological Samples" to Brown et al., issued 

30 September 15, 1998; US Patent No. 5,958,342 "Jet Droplet Device" to Gamble et al., issued 
Sept. 28, 1999; US Patent 5,994,076 "Methods of Assaying Differential Expression" to 
Chenchik et al., issued Nov. 30, 1999; US Patent No. 6,004,755 "Quantitative Microarray 
Hybridization Assays" to Wang, issued Dec. 21, 1999; US Patent No. 6,048,695 



"Chemically Modified Nucleic Acids and Methods for Coupling Nucleic Acids to Solid 
Support" to Bradley et al., issued April 11, 2000; US Patent No. 6,060,240 "Methods for 
Measuring Relative Amounts of Nucleic Acids in a Complex Mixture and Retreival of 
Specific Sequences Therefrom" to Kamb et al., issued May 9, 2000; US Patent No. 
5 6,090,556 "Method for Quantitatively Determining the Expression of a Gene" to Kato, 
issued July 18, 2000; US Patent 6,040,138 "Expression Monitoring by Hybridization to 
High Density Oligonucleotide Arrays" to Lockhart et al., issued March 21, 2000; NHGRI 
Microarray Project Protocols: www.nhgri.nih.gov/DIR/Microarray/protocols.html; 
MacGregor P, Microarray protocol: 
10 www.uhnres.utoronto.ca/services/microan*ay/download/protocols/procol_edward.pdf; and 
Hedge et.al. (2000) Biotechniques 29: 548^562. 

[0134] As the number of probes to be hybridized (i.e., the number of genes or 

sequences to be analyzed) increases, it is often desirable to produce replicate or copies of 
the microarray. The following illustrates one exemplary automatable array copying format, 

15 e.g., for producing replicate microarrays incorporating copies of the nucleic acids 

corresponding to RNA expression products from biological samples. For example, arrays 
can be copied in an automated format to produce duplicate arrays, master arrays, amplified 
arrays and the like, e.g., where repeated hybridization and washing of defined sequence 
probes makes recovery or detection of nucleic acids from an original array problematic (e.g. 

20 where a process to be performed destroys the original nucleic acids or attenuates the signal). 
Copies can be made from master arrays, reaction mixture arrays or any duplicates thereof. 
[0135] For example, nucleic acids (e.g., a plurality of expressed RNA samples from 

biological sources) can be dispensed into one or more master multiwell plates and, typically, 
amplified to produce a master array of amplified nucleic acids (e.g., by PCR) to produce an 

25 ' array of amplification products. The array copy system then transfers aliquots from the 
wells of the one or more master multiwell plates to one or more copy multiwell plates. 
Typically, a fluid handling system will deposit copied array members in destination 
locations, although non-fluid based member transport (e.g., transfer in a solid or gaseous 
phase) can also be performed. 

30 [0136] Arraying techniques for producing both master and duplicate arrays from 

populations of nucleic acids can involve any of a variety of methods. For example, when 
forming solid phase arrays (e.g., as a copy of a liquid phase array, or as an original array), 
members of the population can by lyophilized or baked on a solid surface to form a solid 
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phase array, or chemically coupled or printed (e.g ? , using ink-jet printing or chip-masking 
and photo-activated synthesis methods) to the solid surface. 

Expression Profiling 

[0137] The plurality of probes (e.g., set of genes or gene products) selected for 

5 analysis can be selected, for example, by scanning the literature or by performing empirical 
studies. In one embodiment, the probes are selected from among genes (or gene products) 

r 

that are (a) expressed at detectable levels within the biological samples, and (b) are likely to 
change as a result of exposure to one or more member compositions. Two types of genes 
(or their respective gene products) are typically monitored during generation of the genetic 

10 response profile: genes that are empirical responders (i.e: marker genes) and genes that are 
known or suspected to be involved in the pathways or disease area of interest (i.e., disease 
related genes). Optionally, one or more genes known to be affected by at least one 
composition in the set of compounds or chemicals are monitored (e.g., a positive control). 
[0138] Typically, a moderate to large number of genes (i.e., expressed RNAs) are 

15 selected for analysis, i.e., expression (or response) profiling. Such a set of genes commonly 
includes at least three polynucleotide sequences, more commonly between about 10 and 
about 20 sequences, often about 50 sequences, sometimes about 100, and occasionally as 
many as about 1000, or more individual polynucleotide sequences, e.g., corresponding to 
different or distinct genes. Nucleic acid sequences that can be monitored in the methods of 

20 the present invention include, but are not limited to, those listed with the National Center for 
Biotechnology Information (www.ncbi.nlm.nih.gov) in the GenBank® databases, and 
sequences provided by other public or commercially-available databases (for example, the 
NCBI EST sequence database, the EMBL Nucleotide Sequence Database; Incyte's (Palo 
Alto, CA) LifeSeq™ database, and Celera's (Rockville, MD) "Discovery System"™ 

25 database). For example, nucleic acids that can be monitored (e.g., as part of the genetic 
response profile) according to the methods of the present invention include, nucleic acids 
encoding proteins including, but not limited to, signaling proteins, regulatory proteins, 
pathway specific proteins, receptor proteins, and other proteins involved in one or more 
biochemical pathways. 

30 Analysis of Gene Expression Data 

[0139] Patterns of gene expression in expressed RNA samples can be evaluated by 

either (or both) qualitative and quantitative measures. Certain of the above described 

techniques for evaluating gene expression (as RNA or protein products) yeild data that are 
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predominantly qualitative in nature. That is, the methods detect differences in expression 
that classify expression into distinct modes without providing significant information 
regarding quantitative aspects of expression. For example, a technique can be described as 
a qualitative technique if it detects the presence or absence of expression of a candidate 
5 gene; i.e., an on/off pattern of expression. Alternatively, a qualitative technique measures 
the presence (and/or absence) of different alleles, or variants, of a gene product. 
[0140] In contrast, some methods provide data that characterizes expression in a 

quantitative manner. That is, the methods relate expression on a numerical scale, e.g, a 
scale of 0-5, a scale of 1-10, a scale of + - +++, from grade 1 to grade 5, a grade from a to z, 

10 or the like. It will be understood that the numerical, and symbolic examples provided are 
arbitrary, and that any graduated scale (or any symbolic representation of a graduated scale) 
can be employed in the context of the present invention to describe quantitative differences 
in gene expression. Typically, such methods yield information corresponding to a relative 
increase or decrease in expression. 

15 [0141] Any method that yields either quantitative or qualitative expression data is 

suitable for evaluating signals corresponding to hybridization between a defined sequence 
probe, e.g., corresponding to a gene, such as a disease related gene) and an arrayed nucleic 
acid sample. In some embodiments, it is useful to quantitate the level of expression of a 
gene relative to other expression products, and/or relative to a control sequence. One 

20 convenient and broadly applicable method of determining relative expression and 

hybridization levels between expression products on an array, as well as between physical 
arrays, is to compare the expression of one or more genes of interest to the expression of a 
control gene, such as a housekeeping gene (e.g., HSP 70, (3-actin, etc.) One or more defined 
sequence probes specific for the genes of interest are hybridized along with a probe specific 

25 for the selected housekeeping gene. Hybridization to each of the probes is detected and 
quantitated. Then the hybridization signal corresponding to the genes of interest is 
compared to that for the housekeeping gene. Expression can then be expressed relative to 
that of the housekeeping gene which is expected to be approximately constant between 
within and between samples. 

30 [0142] In order to ascertain whether the observed expression data, e.g., a change in 

expression profiles in response to one or more treatments of a biological sample, are 
significant, and not just a product of experimental noise or population heterogeneity, an 
estimate of a probability distribution can be constructed for each genetic and phenotypic 
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endpoint in each biological sample. Construction of the estimated population distribution 
involves running multiple independent experiments for each treatment, e.g. all experiments 
are run in duplicate, triplicate, quadruplicate or the like. 

[0143] Analysis of the data involves the use of a number of statistical tools to 

5 evaluate the measured expression as extrapolated from the hybridization signal, e.g., 

responses and changes resulting from one or more treatment of a biological sample, based 
on type of change, direction of change, shape of the curve in the change, timing of the 
change and amplitude of change. 

[0144] Multivariate statistics, such as principal components analysis (PCA), factor 

10 analysis, cluster analysis, n-dimensional analysis, difference analysis, multidimensional 
scaling, discriminant analysis, and correspondence analysis, can be employed to 
simultaneously examine multiple variables for one or more patterns of relationships (for a 
general review, see Chatfield and Collins, Introduction to Multivariate Analysis , published 
1980 by Chapman and Hall, New York; and Hoskuldsson Agnar, Predictions Methods in 
15 Science and Technology , published 1996 by John Wiley and Sons, New York). 

Multivariate data analyses are used for a variety of applications involving these multiple 
factors, including quality control, process optimization, and formulation determinations. 
The analyses can be used to determine whether there are any trends in the data collected, 
whether the properties or responses measured are related to one another, and which 
20 properties are most relevant in a given context (for example, a disease state). Software for 
statistical analysis is commonly available, e.g., from Partek Inc. (St. Peters, MO; see 
www.partek.com). 

[0145] One common method of multivariate analysis is principal component 

analysis (PCA, also known as a Karhunen-Loeve expansion or Eigen-XY analysis). PCA 

25 can be used to transform a large number of (possibly) correlated variables into a smaller 
number of uncorrected variables, termed "principal components/' Multivariate analyses 
such as PCA are known to one of skill in the art, and can be found, for example, in Roweis 
and Saul (2000) Science 290:2323-2326 and Tenenbaum et al. (2000) Science 290:2319- 
2322. Several methods of constructing and analyzing dataspace, e.g., including multivariate 

30 analysis are available. See, e.g., Hinchliffe (1996) Modeling Molecular Structures John 
Wiley and Sons, NY, NY; Gibas and Jambeck (2001) Bioinformatics Computer Skills 
O'Reilly, Sebastopol, CA; Pevzner (2000) Computational Molecular Biology and 
Algorithmic Approach , The MIT Press, Cambridge MA; Durbin et al. (1998) Biological 



Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , Cambridge 
University Press, Cambridge, UK; Rashidi and Buehler (2000) Bioinformatic Basics: 
Applications in Biological Science and Medicine , CRC Press LLC, Boca Raton, FL; and 
Mount (2001) Bioinformatics: Sequence and Genome Analysis , Cold Spring Harbor Press, 
5 New York. 

[0146] The expression data from multiple biological samples can be grouped, or 

clustered, using multivariate statistics. Clusters for each different stimulation (treating) and 
observation (detecting) experiment are compared and a secondary set of 
correlations/noncorrelations are made. Based on these different sets of correlations, a 

10 network map can be created wherein the relative relationships of the different genetic 
elements can be established as well as how they may act in concert. In addition, the data 
can be visualized using graphical representations. Thus, the temporal changes exhibited by 
the different biochemical and genetic elements within a genetically-related group of cells 
lines can be transformed into information reflecting the functioning of the cells within a 

1 5 given environment. 

[0147] For example in the context of screening a compound, e.g., chemical, library, 

compounds that evoke a similar genetic response are likely to share one or more 
mechanisms of action. Through analysis of a set of compounds and/or chemical analogues, 
pathway specific inhibitors and comparable pharmacophores, the mechanistic differences 

20 and commonalities can be elucidated. A difference analysis provides the means to identify 
one or more elements responsible for the desired activity or phenotypic response. In 
addition, the dose response data coupled with the difference analysis enables the creation of 
a mechanism of action (MO A) model. Libraries of compositions can be screened for their 
ability to evoke a genetic response profile similar to that targeted for the desired activity. 

25 Furthermore, compositions can be tested against the MOA model to assess if they stimulate 
similar mechanisms of response. 

[0148] Different experimental outcomes are compared by the similarity of the 

pattern of expression profiles generated. This similarity is revealed using, for example, 
clustering analysis. A number of clustering algorithms are commonly used for this type of 
30 study (see JA Hartigan (1975) Clustering Algorithms , Wiley, NY). The comparisons 

between profiles can be performed at the level of individual genes, clusters of genes known 
to be involved in specific pathways or mechanisms, individual cell lines, or for the entire 
experimental data set. For example, for each experimental pair, e.g. two different 
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composition treatment sets, a distance metric can be defined as D = 1 - p, where p is the 
correlation coefficient between the expression profiles. The value of D indicates the level 
of similarity between two experimental pairs. In this manner, a matrix can be created 
wherein chemicals producing similar profiles closely cluster, i.e. D is small, and those with 
5 divergent profiles will have large D values. This type of analysis can reveal, for example, 
similarities in the mechanism of response of various chemicals. Furthermore, analysis 
among similar cell types and between different cell types is used to determine what cell, 
tissue, organ or tumor types may be more or less vulnerable when exposed to a given 
chemical, 

10 Nucleic Acid Hybridization 

[0149] Following production of an array of nucleic acid corresponding to expressed 

RNA products, expression is evaluated for a set of probes. Each of the probes in a set is 

composed of a unique defined sequence of polynucleotides. Different members of a probe 

set can be either related or unrelated polynucleotide sequences, and commonly correspond 

15 to polynucleotide sequences associated with disease related genes or targets. Frequently, 
the defined sequence probes are synthetic oligonucleotides, although alternative synthetic 
probes are also suitable, e.g., cDNA probes, restriction fragments, amplification products, 
and the like. Hybridization of the plurality of defined sequence probes occurs in a single 
reaction mixture (hybridization mixture). Differential detection of the different probes is 

20 made possible by the inclusion of a different label or signal generating moiety. For 
example, different defined sequence probes to be analyzed simultaneously in a single 
hybridization reaction can include different fluorescent labels which can be distinguished on 
the basis of their emission spectra. Alternatively, each defined sequence probe can 
incorporate an amplifiable signal element, e.g., an oligonucleotide sequence which can be 

25 amplified in a subsequent amplification reaction incorporating a fluorescent or other 
detectable moiety. 

[0150] Nucleic acids "hybridize" when they associate, typically in solution. Nucleic 

acids hybridize due to a variety of well characterized physico-chemical forces, such as 
hydrogen bonding, solvent exclusion, base stacking and the like. An extensive guide to the 
30 hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in 

Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes , part I, 
chapter 2, "Overview of principles of hybridization and the strategy of nucleic acid probe 
assays," (Elsevier, New York), as well as in Ausubel, supra. Hames and Higgins (1995) 
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Gene Probes 1. IRL Press at Oxford University Press, Oxford, England (Hames and Higgins 
1) and Hames and Higgins (1995) Gene Probes 2, IRL Press at Oxford University Press, 
Oxford, England (Hames and Higgins 2) provide details on the synthesis, labeling, detection 
and quantification of DN A and RNA, including oligonucleotides. 
5 [0151] "Stringent hybridization wash conditions" in the context of nucleic acid 

hybridization experiments, such as Southern and northern hybridizations, are sequence 
dependent, and are different under different environmental parameters. An extensive guide 
to the hybridization of nucleic acids is found in Tijssen (1993), supra, and in Hames and 
Higgins 1 and Hames and Higgins 2, supra. 

10 [0152] For purposes of the present invention, generally, "highly stringent" 

hybridization and wash conditions are selected to be about 5° C or less lower than the 
thermal melting point (T m ) for the specific sequence at a defined ionic strength and pH (as 
noted below, highly stringent conditions can also be referred to in comparative terms). The 
T m is the temperature (under defined ionic strength and pH) at which 50% of the test 

15 sequence hybridizes to a perfectly matched primer. Very stringent conditions are selected 
to be equal to the T m for a particular primer. 

[0153] The T m is the temperature of the nucleic acid duplexes indicates the 

temperature at which the duplex is 50% denatured under the given conditions and its 
represents a direct measure of the stability of the nucleic acid hybrid. Thus, the T m 
20 corresponds to the temperature corresponding to the midpoint in transition from helix to 
random coil; it depends on length, nucleotide composition, and ionic strength for long 
stretches of nucleotides. 

[0154] After hybridization, unhybridized nucleic acid material can be removed by a 

series of washes, the stringency of which can be adjusted depending upon the desired 

25 results. Low stringency washing conditions (e.g., using higher salt and lower temperature) 
increase sensitivity, but can product nonspecific hybridization signals and high background 
signals. Higher stringency conditions (e.g., using lower salt and higher temperature that is 
closer to the hybridization temperature) lowers the background signal, typically with only 
the specific signal remaining. See, Rapley, R. and Walker, J.M. eds., Molecular 

30 Biomethods Handbook (Humana Press, Inc. 1998) (hereinafter "Rapley and Walker"), 
which is incorporated herein by reference in its entirety for all purposes. 
[0155] Thus, one measure of stringent hybridization is the ability of the probe to 

hybridize to one or more of the target nucleic acids (or complementary polynucleotide 
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sequences thereof) under highly stringent conditions. Stringent hybridization and wash 
conditions can easily be determined empirically for any test nucleic acid. 
[0156] For example, in determining highly stringent hybridization and wash 

conditions, the hybridization and wash conditions are gradually increased (e.g., by 
5 increasing temperature, decreasing salt concentration, increasing detergent concentration 
and/or increasing the concentration of organic solvents ^ such as formalin, in the 
hybridization or wash), until a selected set of criteria are met. For example, the 
hybridization and wash conditions are gradually increased until a target nucleic acid, and 
complementary polynucleotide sequences thereof, binds to a perfectly matched 

10 complementary nucleic acid. 

[0157] A target nucleic acid is said to specifically hybridize to a probe (or primer) 

nucleic acid when it hybridizes at least Vz as well to the probe as to a perfectly matched 
complementary target, i.e., with a signal to noise ratio at least Vi as high as hybridization of 
the probe to the target under conditions in which the perfectly matched probe binds to the 

15 perfectly matched complementary target with a signal to noise ratio that is at least about > 
2.5x-10x, typically 5x-10x as high as that observed for hybridization to any of the 
unmatched target nucleic acids. 

Labels 

[0158] In the methods of the present invention, multiple probes, each of defined 

20 sequence, and each of which is capable of giving rise to a different detectable signal, are 

hybridized simultaneously, i.e., in a single reaction, to a nucleic acid array. In one favorable 
embodiment, the probes are each labeled with a different fluorescent chromaphore. A 
fluorescent label may be covalently attached, noncovalently intercalated, or may be an 
energy transfer label. Other useful labels include mass labels, which are incorporated into 
25 amplification products and released after the reaction for detection, chemiluminescent 
labels, electrochemical and infrared labels, isotopic derivatives, nanocrystals, or any of 

various enzyme-linked or substrate-linked labels detected by the appropriate enzymatic 

t 

reaction. 

[0159] One preferred embodiment of the methods of the present invention includes 

30 the use and detection of one or more fluorescent labels. Generally, fluorescent molecules 
each display a distinct emission spectrum, thereby allowing one to employ a plurality of 
fluorescent labels in a single mixed probe reaction, and then separate the mixed data into its 
component signals by spectral deconvolution. Exemplary fluorescent labels for use in the 
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methods of the present invention include a single dye covalently attached to the molecule 
being detected, a single dye noncovalently intercalated into product DNA, or an energy- 
transfer fluorescent label. Numerous suitable combinations of fluorescent labels are known 
in the art, and available from commercial sources (e.g., Molecular Probes, Eugene Oregon; 
5 Sigma, St. Louis, Missouri). 

[0160] For example, fluorescent moieties, including Alexa Fluor 350, Alexa Fluor 

405, Alexa Fluor 430, Alexa Fluor 488, Alexa Fluor 532, Alexa Fluor 532, Alexa Fluor 
546, Alexa Fluor 555, Alexa Fluor 568, Alexa Fluor 594, Alexa Fluor 647, Alexa Fluor 
660, Alexa Fluor 680, AMCA, BODIPY 630/650, BODIPY 650/665, BODIPY-FL, 

10 BODIPY-R6G, BODIPY-TMR, BODIPY-TRX, carboxyfluorescein, Cascade Blue, Cy3, 
Cy5, Cy5.5, 6-FAM, Fluorescein, HEX, 6-JOE, Lissamine rhodamine B, Oregon Green 
488, Oregon Green 500, Oregon Green 514, Pacific Blue, REG, Rhodamine Green, 
Rhodamine Red, ROX, SpectrumAqua, TAMRA, TET, Tetramethylrhodamine, and Texas 
Red, are generally known in the art and routinely used for identification of discreet nucleic 

15 acid species, such as in sequencing reactions. One of skill in the art, can easily select dyes 
having different emission spectra, enabling detection of differently labeled probes 
hybridized to the same nucleic acid array. One suitable combination compatible with many 
common lasers and filters includes, e.g., Fluorescein, Texas Red, Cy3, and Cy5, or a 
combination of, e.g., Alex Fluor dyes according to the manufacturer's instructions 

20 (Molecular Probes, Eugene, Oregon). 

[0161] The signal strength obtained from fluorescent dyes can be enhanced through 

use of related compounds called energy transfer (ET) fluorescent dyes. After absorbing 
light, ET dyes have emission spectra that allow them to serve as "donors" to a secondary 
"acceptor" dye that will absorb the emitted light and emit a lower energy fluorescent signal. 

25 Use of these coupled-dye systems can significantly amplify fluorescent signal. Examples of 
ET dyes include the ABI PRISM BigDye terminators, recently commercialized by Perkin- 
Elmer Corporation (Foster City, CA) for applications in nucleic acid analysis. These 
chromaphores incorporate the donor and acceptor dyes into a single molecule and an energy 
transfer linker couples a donor fluorescein to a dichlororhodamine acceptor dye, and the 

30 complex is attached to a DNA replication primer. Alternatively, signals corresponding to 
hybridization of a probe to a nucleic acid can be amplified using anti-dye antibodies, or 
enzyme mediated amplification strategies, such as tyramide signal amplification and 
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enzyme labeled fluorescence (ELF) technologies (Molecular Probes, Eugene, Oregon: 
additional details can be found in the Molecular Probes handbook and in product literature). 
[0162] Enzyme-linked reactions theoretically yield an infinite signal, due to 

amplification of the signal by enzymatic activity. In this embodiment, an enzyme is linked 
5 to a secondary group that has a strong binding affinity to the molecule of interest. 

Following hybridization of an enzyme linked probe to the nucleic acid array, hybridization 
is detected by a chemical reaction catalyzed by the associated enzyme. Various coupling 
strategies are possible utilizing well-characterized interactions generally known in the art, 
such as those between biotin and avidin, an antibody and antigen, or a sugar and lectin. 

10 Various types of enzymes can be employed, generating colorimetric, fluorescent, 

chemiluminescent, phosphorescent, or other types of signals. Following hybridization to an 
enzyme-linked probe, a chemical reaction is conducted, detecting bound enzyme by 
monitoring the reaction product. The secondary affinity group may also be coupled to an 
enzymatic substrate, which is detected by incubation with unbound enzyme. One of skill in 

15 the art can conceive of many possible variations on enzyme linked labeling methods. 

[0163] Alternatively, technologies such as the use of nanocrystals as a fluorescent 

DNA label (Alivisatos, et al. (1996) Nature 382:609-11) can be employed in the methods of 
the present invention. Another method, described by Mazumder, et al. (Nucleic Acids Res. 
(1998) 26:1996-2000), describes hybridization of a labeled oligonucleotide probe to its 

20 target without physical separation from unhybridized probe. In this method, the probe is 
labeled with a chemiluminescent molecule that in the unbound form is destroyed by sodium 
sulfite treatment, but is protected in probes that have hybridized to target sequence. 
[0164] Other embodiments of labeling include mass labels, which are incorporated 

into amplification products and released after the reaction for detection; chemiluminescent, 

25 electrochemical, and infrared labels; radioactive isotopes; and any of various enzyme-linked 
or substrate-linked labels detectable by the appropriate enzymatic reaction. Many other 
useful labels are known in the art, and one skilled in the art can envision additional 
strategies for labeling amplification products of the present invention. 
[0165] Alternatively, the defined sequence probe can include an amplifi able signal 

30 element, for example a polynucleotide sequence which can serve as the template in a 

subsequent amplification reaction, such as a rolling circle amplification (RCA); ramification 
amplifaction (RAM), branched DNA amplification (BDA); hybridization signal 
amplification method (HSAM); and 3DNA dendrimer probes (Genisphere, Hatfield, PA). 



Additional methods for amplifying a signal include those described in, e.g., United States 
Patents 6,251,639 and 5,545,522. The use of defined sequence probes incorporating 
amplifiable signal elements is particularly favored when the array comprises RNA or cDNA 
corresponding to expressed nucleic acids. 

5 Detection Methods 

[0166] Following hybridization of the defined sequence probes to the nucleic acid 

array, hybridization between the probes and the nucleic acids of the array is detected and/or 

detected, and optionally quantitated. Some embodiments of the methods of the present 

invention enable direct detection of products. Other embodiments detect reaction products 

10 via a label associated with one or more of the probes. 

[0167] A variety of commercially available detectors, including, e.g., optical and 

fluorescent detectors, optical and fluorescent microscopes, plate readers, CCD arrays, 
phosphorimagers, scintillation counters, phototubes, photodiodes, and the like, and software 
is available for digitizing, storing and analyzing a digitized video or digitized optical or 

15 other assay results, e.g., using PC (Intel x86 or pentium chip- compatible DOS™, OS2™ 
WINDOWS™, WINDOWS NT™ or WINDOWS95™ based machines), MACINTOSH™, 
or UNIX based (e.g., SUN™ work station) computers. 

[0168] One described approach for quantifying fluorescence is to use a 

photomultiplier tube detector combined with a laser light scanner. Fluorescence imaging 

20 can also be performed using a charge-coupled device camera combined,e.g., with a UV light 
or xenon arc source. Fluorescent dyes with bimodal excitation spectra may be broadly 
implemented on a wide range of analytical imaging devices, permitting their widespread 
application to analysis of expression data (e.g., signals corresponding to hybridization 
between labeled probes and arrayed nucleic acids corresponding to expression products) in 

25 semiautomated analysis environments. 

[0169] For example, the Perkin Elmer ScanArray Express microarray scanner, is 

capable of monitoring upio 5 dyes simultaneously, and is favorable employed in the 
methods of the present invention. 

Systems for Gene Expression Analysis 
30 [0170] The present invention also provides an integrated system for evaluating gene 

expression. The integrated system typically includes a logical or spatial array, e.g., a 

microarry organized on a glass slide, incorporating nucleic acid samples corresponding to a 

plurality of expressed RNA products derived from multiple biological sources or samples, 
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e.g., cell lines, tissues, organ biopsies, organisms, etc. Optionally, the integrated system can 
include various components for preparation and collection of such biological samples, e.g., 
providing such functions as cell culture, most commonly in multi-well plates e.g., 96, 384, 
768 or 1536 well plates (available from various suppliers such as VWR Scientific Products, 
5 West Chester, PA). Components and systems for automating the entire process, , e.g., 

sample and reagent pipetting, liquid dispensing, timed incubations, and final readings of the 
microplate in detector(s) are commercially available, and can be employed in the context of 
the systems of the present invention {see, e.g., Zymark Corp., Hopkinton, MA; Air 
Technical Industries, Mentor, OH; Beckman Instruments, Inc. Fullerton, CA; Precision 

10 Systems, Inc., Natick, MA, etc.). These configurable systems provide high throughput and 
rapid start up as well as a high degree of flexibility and customization. Similarly, arrays and 
array readers are available, e.g., from Affymetrix, PE Biosystems, and others. 
[0171] The manufacturers of such systems provide detailed protocols the various 

high throughput. Thus, for example, Zymark Corp. provides technical bulletins describing 

15 screening systems for detecting the modulation of gene transcription, ligand binding, and 
the like. 

[0172] For example, the system favorably includes a module for RNA isolation. 

Two commmercially available useful in the context of the present invention include 
platforms marketed by, Qiagen and GenoVision. Qiagen protocols using the 96-well 

20 RNeasy product and vacuum filtration can be performed using, e.g., a BioMek Multimek 
96-tip pipetting system. This product and protocol isolates total RNA. Alternatively, the 
GenoVision GenoM-48 and GenoM-96 systems that are capable of isolating mRNA using 
polyT-conjugated magnetic beads for 48 or 96 samples at a time can be employed for RNA 
isolation from biological samples. Unlike the Qiagen process that requires user intervention 

25 to swap plates, the GenoVision process is fully automated. 

[0173] The system typically includes an amplification module for producing a 

plurality of amplification products from a pool of expressed RNA products (e.g., expressed 
RNA products obtained from a biological sample); a detection module for detecting one or 
more members of the plurality of amplification products and generating a set of gene 

30 expression data; and an analyzing module for organizing and/or analyzing the data points in 
the data set. Any or all of these modules can comprise high throughput technologies and/or 
systems. 
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[0174] For example, the amplification module of the system of the present invention 

produces a plurality of amplification products from an expressed RNA sample. Optionally, 
the amplification module includes at least one pair of universal primers and at least one pair 
of target-specific primers for use in the amplification process, as described above. 
5 Furthermore, the amplification module can include components to perform one or more of 
the following reactions: a polymerase chain reaction (e.g., an rtPCR, a multiplex PCR, etc.), 
a transcription-based amplification, a self-sustained sequence replication, a nucleic acid 
sequence based amplification; a ligase chain reaction, a ligase detection reaction, a strand 
displacement amplification, a repair chain reaction, a cyclic probe reaction, a rapid 

10 amplification of cDNA ends, an invader assay, a bridge amplification, a rolling circle 
amplification, solution phase and/or solid phase amplifications, and the like. 
[0175] / The system also includes a hybridization module for contacting a plurality of 
differently labeled defined sequence probes with the nucleic acid microarray. The 
hybridization module commonly includes an incubation chamber or coverslip for 

15 maintaining conditions suitable for hybridization in solution of the plurality of probes with 
the nucleic acids disposed on the microarray. Optionally, the hybridization module 
accomodates additional reagents and reactions for amplifying the hybridization signal. 
Alternatively, a separate module is included for purposes of amplifying the hybridization 
signal. 

20 [0176] The detection module detects the presence, absence, or quantity of 

hybridization between the plurality of probes and the microarray. Additionally, the 
detection module generates a set of gene expression data, generally in the form of a plurality 
of data points. Most commonly, the data points are recorded in a database. Typically, the 
data points are recorded in a computer readable medium, i.e., to generate a computer based 

25 database. 

[0177] The third component of the system of the present invention, the analyzing 

module, is in operational communication with the detection module. The analyzing module 
of the system includes, e.g., a computer or computer-readable medium having one or more 
one or more logical instructions for analyzing the plurality of data points generated by the 
30 detection system. The analyzing system optionally comprises multiple logical instructions; 
for example, the logical instructions can include one or more instructions which organize 
the plurality of data points into a database and one or more instructions which analyze the 
plurality of data points. The instructions can include software for performing one or more 



statistical analyses on the plurality of data points. Additionally (or alternatively), the 
instructions can include or be embodied in software for generating a graphical 
representation of the plurality of data points. For example, Silicon Genetics' GeneSpring 
software is one suitable software program for use in the context of the present invention. 
5 [0178] The computer employed in the analyzing module of the present invention can 

be, e.g., a PC (Intel x86 or Pentium chip- compatible DOS™, OS2™ WINDOWS™ 
WINDOWS NT™, WINDOWS95™, WINDOWS 98™ , or WINDOWS ME™), a LINUX 
based machine, a MACINTOSH™, Power PC, or a UNIX based machine (e.g., SUN™ 
work station) or other commercially common computer which is known to one of skill, 
f 10 Software for computational analysis is available, or can easily be constructed by one of skill 

using a standard programming language such as VisualBasic, Fortran, Basic, C, C++, Java, 
or the like. Standard desktop applications such as word processing software (e.g., Microsoft 
Word™ or Corel WordPerfect™) and database software (e.g., spreadsheet software such as 
Microsoft Excel™, Corel Quattro Pro™, or database programs such as Microsoft Access™ 

15 or Paradox™) can also be used in the analyzing system of the present invention. 

[0179] The computer optionally includes a monitor that is often a cathode ray tube 

("CRT") display, a flat panel display (e.g., active matrix liquid crystal display, liquid crystal 
display), or others. Computer circuitry is often placed in a box that includes numerous 
integrated circuit chips, such as a microprocessor, memory, interface circuits, and others. 

20 The box also optionally includes a hard disk drive, a floppy disk drive, a high capacity 
removable drive such as a writeable CD-ROM, and other common peripheral elements. 
Inputting devices such as a keyboard or mouse optionally provide for input from a user. 
[0180] The computer typically includes appropriate software for receiving user 

instructions, either in the form of user input into a set parameter fields, e.g., in a GUI, or in 

25 the form of preprogrammed instructions, e.g., preprogrammed for a variety of different 
specific operations. The software then converts these instructions to appropriate language 
for instructing the operation of the fluid direction and transport controller to carry out the 
desired operation. 

[0181] The software can also include output elements for displaying and/or further 

30 analyzing raw data, massaged data, or proposed results from one or more computational 
processes involved in the analysis of the gene expression data set. 
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Kits 

[0182] In an additional aspect, the present invention provides kits embodying the 

methods, compositions, and systems for analysis of gene expression as described herein. For 
example, a kit of the present invention can include one or more microarray slides (or 
5 alternative microarray format) onto which a plurality of different nucleic acid samples, each 
corresponding to an expressed RNA sample obtained from biological samples, e.g., samples 
treated with members of a compound library, have been deposited. The kit can also include 
a plurality of labeled probes. Alternatively, the kit can include a plurality of polunucleotide 
sequences suitable as probes and a selection of labels suitable for customizing the included 

10 polynucleotide sequences, or other polynucleotide sequences at the discretion of the 

practitioner. Commonly, at least one included polynucleotide sequence corresponds to a 
control sequence, e.g., 0-actin, a "housekeeping" gene, or the like. Exemplary labels 
include, but are not limited to, a fluorophore, a dye, a radiolabel, an enzyme tag, etc., that is 
linked to a nucleic acid primer itself. 

15 [0183] In one embodiment, kits that are suitable for amplifying nucleic acid 

corresponding to the expressed RNA samples are provided. Such a kit includes reagents 
and primers suitable for use in any of the amplification methods described above. 
Alternatively, or additionally, the kit are suitable for amplifying a signal corresponding to 
hybridization between a probe and a target nucleic acid sample (e.g., deposited on a 

20 microarray). 

[0184] In addition, one or more materials and/or reagents required for preparing a 

biological sample for gene expression analysis are optionally included in the kit. 
Furthermore, optionally included in the kits are one or more enzymes suitable for 
amplifying nucleic acids, including various polymerases (RT, Taq, etc.), one or more 

25 deoxynucleotides, and buffers to provide the necessary reaction mixture for amplification. 
[0185] Typically, the kits are employed for analyzing gene expression patterns using 

mRNA as the starting template. The mRNA template may be presented as either total 
cellular RNA or isolated mRNA; both types of sample yield comparable results. In other 
embodiments, the methods and kits described in the present invention allow quantitation of 

30 other products of gene expression, including tRNA, rRNA, or other transcription products. 
[0186] Optionally, the kits of the present invention further include software to 

expedite the generation, analysis and/or storage of data, and to facilitate access to databases. 
The software includes logical instructions, instructions sets, or suitable computer programs 
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that can be used in the collection, storage and/or analysis of the data. Comparative and 
relational analysis of the data is possible using the software provided. 
[0187] The kits optionally comprise distinct containers for each individual reagent 

and/or enzyme component. Each component will generally be suitable as aliquoted in its 
5 respective container. The container of the kits optionally includes at least one vial, ampule, 
or test tube. Flasks, bottles and other container mechanisms into which the reagents can be 
placed and/or aliquoted are also possible. The individual containers of the kit are preferably 
maintained in close confinement for commercial sale. Suitable larger containers may 
include injection or blow-molded plastic containers into which the desired vials are retained. 
10 Instructions, such as written directions or videotaped demonstrations detailing the use of the 
kits of the present invention, are optionally provided with the kit. 
[0188] In a further aspect, the present invention provides for the use of any 

composition or kit herein, for the practice of any method or assay herein, and/or for the use 
of any apparatus or kit to practice any assay or method herein. 

15 EXAMPLES 

[0189] The following examples are offered to illustrate, but not to limit the claimed 

invention. It is understood that the following examples and embodiments described herein 
are for illustrative purposes only and that various modifications or changes in light thereof 
will be suggested to persons skilled in the art and are to be included within the spirit and 
20 purview of this application and scope of the appended claims. 

EXAMPLE 1: OUTLINE OF ANALYSIS 

[0190] A set of RNA samples (e.g., mRNA samples), each of which is derived from 

a biological sample, e.g. cells exposed to members of a compound library, is either 
selectively or globally amplified, optionally by >3 logs, to generate cDNA (optionally 

25 amplified RNA ) populations biased toward a subset of the total RNA population, the entire 
mRNA population or the entire RNA population. cDNA populations for a plurality of 
biological samples are spotted onto arrays, preferably optical arrays, e.g. glass slides. 
Arrays are then probed using a plurality of defined sequence probes (e.g., gene specific 
nucleic acid probes linked to a label). The label is optionally covalently attached to the 

30 probes and optionally a fluorescent tag. Other labels and labeling techniques know in the 
art may be used. Each of the probes is capable of giving rise to a different detectable signal, 
e.g., is linked to a different fluorescent label. Following hybridization, the arrays are 
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washed to remove unhybridized probe and a signal corresponding to hybridization between 
the bound probe and the nucleic acid samples on the microarray are detected. 
[0191] In an embodiment of this method, the number of biological samples to be 

analyzed (and optionally, compared for gene expression) exceeds 96 biological samples. 
5 Commonly, greater than 960 or more samples are processed and analyzed on a single 
microarray. Still in further embodiments greater than 9,600 samples are analyzed and 
compared on one or more microarray s. 

EXAMPLE 2: AMPLIFICATION OF RNA 

[0192] Typically, it is desirable to increase the amount of nucleic acid via 

10 amplification of the RNA population, to provide nucleic acids for spotting in microarrays. 
While it is envisioned that there will be improvements over time in the sensitivity of 
microarray detection and analysis, it is generally preferable with current detection strategies 
and instrumentation to amplify a population of nucleic acids corresponding to one or more 
species in an expressed RNA sample. 
15 [0193] Numerous methods are known in the art for the amplification of nucleic 

acids in general and RNA specifically. Examples of amplification techniques include PCR, 
NASBA, TMA, RCA, as well as alternative amplification methods, e.g., as described in 
Puskas et al. (2002) Biotechniques 32:1330-1334; Eberwine (1996) Biotechniq ues 20: 584- 
591; Van Gelder et al. (1990) Proc Natl Acad Sci USA 87:1663-1667; and in United States 
20 Patents Numbers: 6,251,639, 6, 5,962,271 and 5,545,522. The majority of these methods 
can be used for either global amplification of nucleic acids, e.g. using random priming 
and/or poly T priming, or specific amplification using gene or gene family targeted priming. 

Global amplification 

[0194] Numerous methods have been described for global amplification of an 

25 mRNA population. These techniques include various permutations of oligonucleotide 
polyT primed reverse transcriptions followed by various polymerization schemes using 
random or semi-random primers (i.e., random or degenerate oligomer primers) to amplify 
the cDNA population in toto. Methods may use DNA polymerases or a combination of 
DNA and RNA polymerases. 
30 [0195] The primary advantage of global amplification is that the samples, once 

placed in the array, can be probed for virtually any gene. And multiple arrays can be 
generated simply by replication to scale up the process for as many genes as is desired. The 
global amplification approach simplifies the processes associated with preparing samples 
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for arraying since only one protocol and set of reagents is required. In addition, only one 
purification/desalting processes is required per original RNA sample. All of these elements 
can lead to a lower overall cost for running the process. 

[0196] Global amplification has some disadvantages as well. These disadvantages 

5 relate to the fact that there is less of each amplified gene within a given sample. The 
reduced quantity of a given gene may reduce the sensitivity of the assay for that gene 
relative to some of the more targeted amplification techniques. The presence of an 
abundance of other genes and their associated nucleic acids also means that there is a higher 
potential for cross reactivity during the probe hybridization, requiring a more careful 
10 selection and analysis of probes during the experimental design phase. Thus for genes 
expected to be highly expressed, global amplification is often preferable, whereas, in the 
case of target sequences expressed at lower levels, a greater demand for sensitivity can 
make selective amplification protocols preferable. 

Selective amplification 

15 [0197] Virtually all of the amplification techniques that can be applied to amplifying 

mRNA may be used to selectively amplify a subpopulation of genes within a given RNA 
sample. These methods can be as selective as targeting amplification to only a few genes, 
e.g. the use of rtPCR and a minimal set of gene specific primers, or to a fairly large partition 
of sequences, e.g. the use of a random set of indexed primers as is common with differential 

20 di splay techniques . 

[0198] The use of selective amplification has a number of advantages as compared 

to global amplification. One advantage is the enhancement of the quantities of the specific 
genes being amplified versus the whole RNA population. This enhancement leads to 
greater concentrations of the genes to be measured and potentially improved sensitivities 

25 and specificities in probe hybridization. 

[0199] A second advantage of selective amplification is that it can increase the level 

of multiplexing at the probing level. For example, probing using fluorescently labeled 
oligonucleotide probes limits the total number of probes that be detected in parallel to the 
total number of fluorescent chromophores that can be uniquely detected and quantitated by 

30 the fluorescence detection system, e.g. commonly no more than 4-5 different chromophores 
per experiment. Globally amplified RNA products containing copies of all expressed genes 
can only be probed for a maximum of 4-5 genes at a time, per physical array, meaning that 
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if one wishes to probe for 16-20 different genes, one would need to spot N the RNAs onto a 
minimum of 4 different arrays and probe them independently. 

[0200] Selective amplification allows for the differential amplification of genes in 

different samples. Amplification methods that select and amplify certain subsets or 
5 subpopulations can be used to partition the RNA into multiple groupings or pools. These 
groupings provide samples with reduced sequence complexity that offers advantages in 
probing accurately and selectively for particular sequences. In addition the different 
partitioned groups can be arrayed on the same surface since there will be limited cross 
interaction of gene specific probes to the different subpopulations. 

10 [0201] For example, genes A-D can be amplified in one reaction, while E-H, I-L, 

and M P are amplified in separate reactions. The products of these amplifications can all be 
spotted on a single array, with each amplification occupying a different spot in the array, 
and then the array can be probed using probes for all 16 genes simultaneously, wherein 
probes for genes A, E, I and M all use the first fluorescent chromophore, B, F, J, and N and 

15 second chromophore, C, G, K, and O and third chromophore, and so on, 

EXAMPLE 3: ON-CHIP SIGNAL AMPLIFICATION 

[0202] Sensitivity under currently available detection platforms can also be 

increased using signal amplification following the on-chip hybridization step. There are 
numerous schemes known in the art for signal amplification. In the case of signal 

20 amplification on a microarray, the amplified signal remains localized spatially within the 
array. Example amplification schemes that can be used include rolling circle amplification 
(RCA), Ramification Amplification Method (RAM), branched DNA amplification (BDA), 
Hybridization Signal Amplification Method (HSAM), 3DNA dendrimer probes, various 
fluorescence enhancing schemes, and a number of enzyme-linked signal amplification 

25 schemes including various chemiluminescence, fluorescence and colorimetric approaches. 
Virtually all of these schemes have been demonstrated to work in the microarray format and 
provide anywhere from one to five logs of signal amplification. In this invention, it is 
preferred to use a signal amplification scheme that provides three logs or greater 
amplification. 

30 EXAMPLE 4: SCREENING OF A COMPOUND LIBRARY 

[0203] Screening of a compound library is schematically described in Figure 4. The 

process involves several principle steps, all of which allow the samples to be handled in 
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parallel in microliter plate or microarray format. Following the acquisition and lysis of 
cells, the steps involved are (1) RNA isolation, (2) multiplexed rtPCR, (3) DNA isolation, 
(4) spotting of the PCR products on the array, and (5) gene-specific probing and detection. 

Universal-Primer-Based rtPCR 
5 [0204] A population of nucleic acids corresponding to the expressed RNA sample 

obtained from the cells is generated using a multiplexed rtPCR process. rtPCR using a 

targeted amplification strategy is performed both for the large gain in sensitivity and 

because it reduces the complexity of the sample to be analyzed on the array. The use of 

targeted amplification of a small set of gene versus one of the global RNA amplification 

10 methods, such as has been described by Puskas et.al. (2002) Biotechniques , 32:6; 1330- 

1340; and Van Gelder et.al. (1990) PNAS 87:5; 1663-1667, ensures the maximum level of 

discrimination, limiting cross hybridization to one or more amplified homologous or 

partially homologous genes. 

[0205] PCR and rtPCR can be used to amplify a multiplex of targets using very 

15 small amounts of material. This utility has been taken advantage of for a variety of 

applications including genotyping and gene expression. In many cases, especially gene 
expression, it is desirable to quantitate the relative expression levels for the different nucleic 
acid targets. However, standard multiplex rtPCR is not typically quantitative. Significant 
biases can be introduced during the exponential amplification that lead to varied and 

20 nonreproducible data. These biases result from primer-primer interactions, primer-product 
cross-reactions, and from concentration and sequence-dependent variations in amplification 
efficiency, most notably seen in the latter part or plateau phase of thermal cycling. To 
overcome these deficiencies the present invention provides a modified rtPCR process that 
converts PCR to a two-primer process using universal primers. 

25 [0206] The modified rtPCR process uses a combined gene-specific, universal 

priming strategy that overcomes the primary deficiencies of rtPCR without compromising 
' the detection sensitivity that is gained by using the process. The strategy is outlined in 
Figures 8 A & B. Key to the process is the conversion of the multiplex amplification 
process from one involving tens of primers to one using only two primers. The reaction 

30 initializes using gene-specific primers that are capable of specifically detecting each target 
mRNA. In the first stage (1), chimeric primers comprising both a gene-specific sequence 
and, on their 5' ends, a consensus or universal sequence, are employed. During the first few 
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cycles of amplification the specific gene targets are amplified by these chimeric primers, 
creating products that are tailed with the universal primer sequence. 
[0207] The reactions all carry a pair of universal primers present at significantly 

higher concentrations, e.g. a universal ichimeric gene-specific primer ratio of 50:1 (1 /xM 
5 universal:0.02 /xM gene specific). Therefore, as PCR progresses the amplification is 
quickly taken over (2) by the pair of universal primers. This transition from the use of 
many primers to only two effectively collapses the level of reaction complexity and locks in 
the relative concentrations of the different gene targets. In the universal primer 
amplification reaction (shown in Figure 8B) all the products are effectively the same 
10 chemical species and are not differentially amplified. Thus, the relative gene ratios can be 
maintained even as the reaction pushes into the plateau phase. 

[0208] The rtPCR process has been validated for nearly five hundred genes, and 

more than 70 different multiplexes have been built. A variety of different samples have 
been analyzed including measuring expression response for 13 genes in a screen of a 20,000 
15 compound library, 1 Ox-pooled, measuring responses of -400 genes to a set of 20 

compounds that trigger apoptosis, time course studies tracking the responses for 450 genes 
to cell treatments by the natural ligands FasL, TRAIL and TNF-alpha, and a single 20-plex 
to analyze two dozen rat tissue samples. 

Timecourse and CVs 

20 [0209] Figure 9 shows is a plot of 3 genes out of a 15-plex from a time course 

treatment of 5xl0 4 HepG2 cells treated with 25 /xM emitine, a protein synthesis inhibitor. 
Cells were treated in triplicate and timepoints were collected at T=0, 2, 4, 8, 24, 48, and 120 
hours. RNA was isolated using Qiagen's RNeasy kit for 96 well plates and the 
concentration was monitored using a Ribogreen Assay (Molecular Probes). The 

25 concentration of RNA from each sample was normalized to 5 ng/jil. rtPCR was carried out 
using 25 ng of RNA from each sample and the products were analyzed using an ABI 3 100 
Genetic Analyzer. Gene expression is expressed as a ratio to that of GAPDH. The 
multiplex included the following list of genes. 
[0210] 

30 GRO 1 melanoma growth stimulating activity, alpha, oncogene 

EL-8 interleukin 8 

HLA-C homo sapiens major histocompatibility complex C 
Caspase 3 apoptosis-related cysteine protease (transcript variant alpha, mRNA) 
Bak human bak protein 

35 PLAU plasminogen activator, urokinase 

IL6ST interleukin 6 signal transducer (gpl30 oncostatin M receptor) 
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Caspase 1 
Cyclo A 



Fas 

Caspase 4 
Serpine 1 



IL-1 

GAPDH 
IFNAR2 



fas ligand 

apoptosis related cysteine protease 

Serine proteinase inhibitor, clade E, number 1 

Interleukin receptor, type 2 

Glyceraldahyde phoshate dehydrogenase 

Interferon alpha, beta, & omega receptor 2 

apoptosis-related cysteine protease (IL-1, beta, convertase) 

cyclophyllin A 
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[0211] Examplary data from the multiplex is shown in Figure 9, with clear trends of 

induction for 11-8 and GROl. CV's for all genes and data points within the experiment 
ranged from a few percent to 20%. 



[0212] 



Linearity and Dynamic Range 

The universal -primer-based rtPCR approach shows a wide dynamic range 



15 and linear dose-response. To assess the dynamic range of RNA detection by the assay, a 
commercially available purified 7.5kb RNA (Gibco), also used as an external control, was 



attomoles. The quantities of specific PCR product, relative to (3-actin, were determined. 
The dose-response was linear over this range of over 3 orders of magnitude (Figure 10). It 

20 should be noted that this wide dynamic range, is actually the range of the measured gene 
expression ratio relative to (5-actin (attenuated) as indicated on the Y axis. This range 
permits measurements of fold change differences in gene expression of multiple 
comparative samples of many orders of magnitude. Additionally, it allows the simultaneous 
measurement of high and low copy number transcripts. 

25 [0213] This experiment also demonstrated that the minimum detectable level of 

spiked 7.5kb RNA that could be distinguished from zero was 31 zeptomoles, or 1,9 xlO 4 
molecules, indicated on the X axis. Thus the assay can detect on the order of one transcript 
copy per cell using 10 4 cells. Furthermore, it is expected that utilization of a microarray 
format readout will provide ah additional sensitivity increase that can be used to reduce the 

30 required RNA per reaction by at least 2 logs, down into the sub-nanogram levels. Such a 
sensitivity increase makes it possible to run multiple multiplex reactions using only a few 
nanograms of RNA, and enables researchers to measure expression values for hundreds of 
genes using very small tissue samples such as those that can be acquired and selected using 
laser capture microdissection. 



spiked into 20 ng of total RNA from cultured PC-3 cells in the range of 0.004 to 125 
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Other Benefits of Universal-Primer-Based rtPCR 
[0214] By limiting the concentration of gene-specific primers and using universal 

primers for the bulk of the amplification process we gain the added benefit of relaxing the 
constraints associated with the design of successful gene-specific primers. With the 
5 concentrations of the gene-specific primers kept low (0.02 fiM) their participation in cross- 
reactions and mis-reactions is limited, leading to a higher probability of success in 
amplification with a significantly reduced likelihood for creating artifacts. 
[0215] Another major advantage of the technology is that the format is highly 

flexible in terms of the numbers of genes versus numbers of samples used in a study. For 
10 example, performance of 5,000 multiplex rtPCR reactions with 20 genes per reaction 

generates ~ 100,000 data points. These 5,000 reactions can be used to measure 20 genes for 
5,000 samples, 100 genes for 1,000 samples, 200 genes for 500 samples, or 1,000 genes for 
100 samples. Note that, as will be described in the research plan, it is very straightforward 
to spot all 5,000 reactions onto a single microarray slide for analysis. 

15 Flipping the Microarray Paradigm 

[0216] The analysis process involves flipping the current microarray paradigm, 

wherein the rtPCR products derived from the RNA samples are assembled into an array, and 

the gene- specific oligonucleotide probes are hybridized to these arrays, as opposed to the 

probes being placed on the surface and the samples in solution. 

20 [0217] Standard microarrays differentiate the many genes being monitored using 

specific spatial placement of gene-specific probes on the microarray surface. The methods 
of the present invention use gene specific-probes that are differentiated by the use of two to 
five different labels, e.g., fluorescent labels that can be uniquely identified by their 
v absorption/emission properties. While this approach does limit the number of genes that 

25 can be probed within any single multiplexed rtPCR sample (an issue that is resolved simply 
by making multiple replicates of an array) it leaves free the use of the spatial arraying 
dimensions to parallelize the analysis of samples at a level of 1 to 2 orders of magnitude 
higher than can be attained using microtiter plate formats. 

[0218] The process is shown schematically in Figure 1, wherein a large set of RNA 

30 samples, commonly arrayed in microtiter plates, provide the source to generate a series of 

rtPCR reactions that are then arrayed on one or more microarray slides. Typical microarray 

slides can contain anywhere from a couple thousand to 20,000 "spots" where samples are 

uniquely placed. Therefore, as many as 20,000 different amplified samples can be placed 

and probed on a single slide. In the example shown in Figure 1, the slide is probed using 4 
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different oligonucleotide probes that target 4 different genes. The different probes carry 4 
different fluorescent labels that can be uniquely detected and quantitated in the array reader. 
[0219] The ability to analyze 4 different genes for 20,000 samples on a given slide 

may seem limited in terms of gene depth. However, as stated above, it is trivial to replicate 
5 a given slide using existing slide printing instruments to generate upwards of 100 or more 
slides per set of samples. This replication process is shown schematically in Figure 2, and 
clearly shows how through the use of replicates many more genes can be analyzed from 
these same RNA samples. The processes of printing, probing and scanning the microarray 
plates is a near parallel process, therefore, it takes roughly the same time and resources to 

10 analyze 20 plates as it does 1 (hours difference not days). 

[0220] The methods of the present invention offer many advantages over commonly 

used dot blot methods. These advantages include a 3-4 log increase in sensitivity leading to 
the use of much smaller quantities of RNA, multiplexing in probing and detection that 
increases throughput and enables internal sample measurement of gene to control RNA 

15 ratios, and greatly improved levels of probe discrimination through the use of rtPOR to 

reduce sample complexity. The process adds complexity in terms of the number of sample 
handling steps, but the use of current automated liquid handling, e.g., pipetting, tools limits 
opportunities for sample mixups and pipetting variability while minimizing reagent usage. 

Internal Reference Control + Number of Dyes 
20 [0221] In the example described above, the gene expression values are relative 

expression values. Specifically, each rtPCR multiplex includes the amplification of one or 

more "control" or "reference" genes. Example reference genes include the usual suspects of 

(3-actin, GAPDH, cyclophyllin and others. The consequence is that one of the 

oligonucleotide probes used to monitor each microarray needs to be used for a reference or 

25 control gene. Therefore, if one is using 2 dyes per probing then one can only measure one 
gene plus the reference. Using 5 dyes one can monitor 4 genes plus the reference. The 
number of dyes that can be used will need to be tested empirically, but we will utilize a 
state-of-the-art array scanner, such as the Perkin Elmer ScanArray Express, that can monitor 
up to 5 dyes simultaneously. The number of dyes used directly correlates to the number of 

30 microarray plate replicates that need to be made. For example, an rtPCR multiplex of 20 
genes will need to be replicated onto 20 plates if only 2 dyes are used for analysis (1 gene 
per array + reference), or on to 5 plates if 5 dyes are used (4 genes per array + reference). 
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Arraying Strategies 

[0222] So far we have discussed only one arraying strategy for this technology, 

namely where a single multiplex of -20 genes is used to amplify 10,000 samples to fill up 
the array. There are, however, other schemes in which multiple multiplexes can be spotted 
5 onto the same array. Because each rtPCR only amplifies a targeted set of genes, 

experiments can be designed where multiple multiplexes are used to amplify 100 genes for 
example. In the 100 gene scenario, 5 different multiplexes of 20 genes are independently 
spotted onto the microarray surface. In a 10,000 spot array, 5 different multiplexes can be 
spotted for 2,000 different biological samples. For each multiplex 4 gene-specific 

10 oligonucleotide probes (plus a reference) are created with a different dye conjugated to 
each. The probes for each of the 5 different multiplexes can then be pooled and 
simultaneously hybridized to the microarray. Because each probe will only hybridize to a 
single gene in a single multiplex (unless otherwise desired such as in the case of a standard 
reference gene), and the different spot addresses are tracked for multiplex identity, the 

15 different fluorescence signals can be directly correlated to an individual gene. Of course, 
the typical concerns about homologies and cross hybridization need to be considered during 
the experimental design phase. 

[02231 In either the single or multiple multiplex case, the number of array replicates 

needed is directly related to the size of the largest multiplex used and the number of 
20 fluorescent dyes that can be simultaneously detected. 



/ 
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EXAMPLE 5: EXEMPLARY PROTOCOL 

[0224] The following provides an exemplary procedure for the amplification and 

array hybridization in the context of screening compound libraries according to the methods 

5 of the invention. 

j 

Amplification of RNA using multiplex universal primer driven PCR 
[0225] Total RNA was obtained from cultured cells using an RNA isolation kit 

(Qiagen Rneasy). 20 ng of isolated RNA was then used first in a reverse transcription 

reaction and the PCR. Thirty-one genes were targeted for amplification with the primers 

10 given in Table 1, according to the following conditions. 
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[0226] 

Reverse Transcription 

Gene Specific Reverse Primer 
5 Tris-HCl 
pH 
KC1 
MgCl 
dNTPs 
10 DTT 

Rnase Inhibitor 

MMLV Reverse Transcriptase 

Volume 

15 Polymerase Chain Reaction 
cDNA 

Gene Specific Forward Primer 

Tris-HCl 

PH 

20 KC1 
MgCl 
dNTPs 

Universal Forward Primer 
Universal Reverse Primer 
25 Taq Polymerase 
Volume 



Thermal Cycler Conditions 



@ 0.05 uM 
10 mM 

8.3 

50 mM 
2.5 mM 
1 mM 
0.01 M 
0.1 U 
1.0 U 
20 nl 



10 ul* 
@ 0.02 uM 
10 mM 
8.3 

50 mM 
7 mM 
0.3 mM 
1 uM 

1 \M 

2.5 U 

20 U-1 



48 P C 
37 °C 
42 °C 
95 °C 
4 9 C 



1 minute 
5 minutes 
60 minutes 
5 minutes 
end 



Thermal Cycler Conditions 
95 °C 10 minutes 
94 °C 30 seconds 
55 °C 30 seconds 
68 °C 60 seconds 
repeat steps 2-4 35 cycles 
4 °C end 



* of 20 ul reverse transcription reaction 
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GTACGACTCACTATAGGGAAAATACACCTGGTTTGGGCA 
{ SEQ ID NO: 32) 


GTACGACTCACTATAGGGAAAGGGGAATTTCAGGCATTT 
]. (SEQ ID NO: 34) 


GTACGACTCACTATAGGGAAAGGAGGTGCAACCACACAT 
(SEQ ID NO:3 6) 


GTACGACTCACTATAGGGAGAGGTCAAGCTGCTCAGGTC 
| (SEQ ID NO:38) 


GT AC G AC TC ACT AT AGGGATGC TGAC CTTC TTC C ATTCC 
(SEQ ID NO:40) 


GT ACGAC TC AC TATAGGGAC C AGGGTC AC AGT AGGGAGA 
j (SEQ ID NO: 42) 
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GT ACGAC TC AC TAT AGGGATGC AGTTTTC TGGG AGTGTG 
(SEQ ID NO: 46) 


GTACGACTCACTATAGGGAGCGCTCTACGCAAAGTGAAT 
(SEQ ID NO: 48) 


GTACGACTCACTATAGGGATGTCTTTCCTGCTTGGCTCT 
(SEQ ID NO: 50) 


GTACGACTCACTATAGGGACAAAACATGCCACGAATGAG 
(SEQ ID NO: 52) 


GTACGACTCACTATAGGGACAGCACTTAGATTCGGAGCC 
(SEQ ID NO: 54) 


GTACGACTCACTATAGGGACCCTGGAGCAGTTTTGTAGC 
(SEQ ID NO: 56) 




GTACGACTCACTATAGGGAGGAGGGACCAACCTTGAAAT 
(SEQ ID NO: 58) 


GTAC GAC TC AC T ATAGGGAGC AAATTTTC TGGCTTGAGG 
(SEQ ID NO: 60) 


GTACGACTCACTATAGGGATTTGGGCGATATTTTTCCAC 
(SEQ ID NO: 62) 


GTACGACTCACTATAGGGATGCTGAATACAGACTTGGCG 
(SEQ ID NO: 64) 


GTACGACTCACTATAGGGATTTAGGGAACCTCCGTGAGA 


AGGTGAC AC T ATAGAAT AGTC C GAGTTC TCTGC AGGTC 
(SEQ ID NO: 31) 


AGGTGAC AC TAT AGAATAGGAGAAAC TTGC TAC C GC AC 
(SEQ ID NO: 33) 


AGGTGAC ACTATAGAATATTTTTCCCTGTGTTCTTGGG 
(SEQ ID NO: 35) 


AGGTGACACTATAGAATACCAACAGAAACCACCGTTCT 
(SEQ ID NO: 37) 


AGGTGACACTATAGAATACCAAAGCCTCAGGAACAAGA 
(SEQ ID NO:39) 


AGGTGAC AC TAT AGAAT AGGGCTGTCCATGTCATCTCT 
(SEQ ID NO: 41) 


AGGTGACACTATAGAATATCTTGCCCCTGATATCACAA 
(SEQ ID NO: 43) 


AGGTGAC AC TAT AGAAT AGC CCTGATGTCGGCTAAGT A 
(SEQ ID NO: 45) 


AGGTGAC AC TAT AGAAT AATGG ATGAAAC AGC TGAGC A 
(SEQ ID NO: 47) 


AGGTGACACTATAGAATATGTGGGAACAGGAAC ATTC A 
(SEQ ID NO: 49) 


AGGTGAC AC TATA GAATATTC TAC ATTTGAGGGCC C AG 
(SEQ ID NO: 51) 


AGGTGACACTATAGAATAGCAATCTAAGCAGGGGTCTG 
(SEQ ID NO: 53) 


AGGTGACACTATAGAATATAACATGGAGGAGACCAGGC 
(SEQ ID NO: 55) 




AGGTGAC ACT AT AGAAT AGGGAATCGGAAGGGTTC AT A 
(SEQ ID NO: 57) , 


AGGTGACACTATAGAATACTGTCAGAAGAGGAGACCCG 
(SEQ ID NO: 59) 


AGGTGAC AC TATAGAATATC AGTACTAAACC C CCGC TG 
(SEQ ID NO: 61) 


AGGTGAC AC TAT AGAAT AGAAGTGTTCCGTCCTGGCTA ! 
(SEQ ID NO: 63) j 


AGGTGACACTATAGAATAGGGGGTTTATGAGCCACATT 
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{SEQ ID NO: 66) 


GTACGACTCACTATAGGGATGGGGTTTGAAGTTGGAATC \ 
\ (SEQ ID NO: 68) 


GTACGACTCACTATAGGGACCTTCCCAGAGCTCAATCAG 
, (SEQ ID NO: 70) 


GTACGACTCACTATAGGGACCCTCAGTGAAGCGGTACAT 
(SEQ ID NO: 72) 


GTACGACTCACTATAGGGAAGACAAGACAGGCTGGCACT 
(SEQ ID NO: 74) 


GTACGACTCACTATAGGGACTTGGTCTCCCAAAGTGCTC 
(SEQ ID NO:76) 


GTACGACTCACTATAGGGATTCACATTGCACTGGAAAGC 
(SEQ ID NO: 78) 


GTACGACTCACTATAGGGAGACATTGGTGGTGGTCTCCT 
(SEQ ID NO: 80) 


GTACGACTCACTATAGGGACTCTTCCGTGGTGGAGTAGC 
(SEQ ID NO: 82) 


GT ACGAC TC AC TATAGGG AGAGC TT GC C ATTC AGAGAGG 
(SEQ ID NO: 84) 


GTACGACTCACTATAGGGATCAAAGGACACAACGAGCAG 
(SEQ ID NO: 86) 


GTACGACTCACTATAGGGACGCGGAAGTCCTCTAGACAG 
(SEQ ID NO: 88) 


GTACGACTCACTATAGGGAGGCACAGGAAGCCATAAAGA 
(SEQ ID NO: 90) 


GTAC GAC TC ACTAT AGGGATTTGAAGGGGTTTGC TTGTC 
(SEQ ID NO: 92) 


GTACGACTCACTATAGGGAACACCAAAATACCCCATCCA 
(SEQ ID NO: 94) 


GTACGACTCACTATAGGGATGCCTCTTATCAGCCAGGTC 
(SEQ ID NO: 96) 


GTAC GAC TC ACT AT AGGGAAAC CAGGC AC AAGGTTCAAG 
{SEQ ID NO: 98) 


GTACGACTCACTATAGGGAGATGGTGTTGCAGGATGTTG 
(SEQ ID NO: 100) 


GTAC GAC TC ACT AT AGGGAGTC TC GC T AAT AAC CC C AGC 


(SEQ ID NO: 65) 


AGGTGACACTATAGAATATGGGTGTGGATTCTGTTCTG 
{SEQ ID NO: 67) 


AGGTGACACTATAGAATATGCAAAGGGAAATGCACATA 
(SEQ ID NO: 69) 


AGGTGACACTATAGAATACGAACTTTGACAGCGACAAG 
(SEQ ID NO: 71) 


AGGTGACACTATAGAATACGCAAAGAAAGCTCAGGAAA 
(SEQ ID NO:73) 


AGGTGAC AC T AT AGAATATC TC C ATCTC C TGAC CTC GT 
(SEQ ID NO: 75) 


AGGTGAC AC TATAG AAT AGGTGGAGCAGTTCCTGTGTT 
(SEQ ID NO: 77) 


AGGTGAC ACT AT AGAATAACCGGCTTCCTCATTACCTT 
(SEQ ID NO: 79) 


AGGTGAC AC TAT AGAATATC CAGGC C AC TTTTC AC TTC 
(SEQ ID NO: 81) 


AGGTGAC ACTAT AGAATAGTGGTTC CTGAAC C TGTTGC 
(SEQ ID NO: 83) 


AGGTGACACTATAGAATAGAAGGGAGAGGAAGGGAGTG 
(SEQ ID NO: 85) 


AGGTGACACTATAGAATAGGACGAGATCAAGCCCTACA 
(SEQ ID NO:87) 


AGGTGACACTATAGAATATGGATCCCGGAATAGTCAAC 
(SEQ ID NO: 89) 


AGGTGAC AC TAT AGAATATTTTGGGACGTAAAAGCTGG 
(SEQ ID NO: 91) 


AGGTGAC AC TAT AG AAT AC T TC C TGC AG AG AG AGG AGC 
(SEQ ID NO: 93) 


AGGTGAC AC TAT AGAATAATGTACTTGGAGGACCGC AC 
(SEQ ID NO: 95) 


AGGTGAC ACT AT AGAATAAACATTGAATGGC AC AGC AA 
(SEQ ID NO: 97) 


AGGTGAC AC TAT AGAATAATTCTGGCAAAGCCAATCTG 
(SEQ ID NO: 99) 


AGGTGAC AC TATAG AAT AATC AGC ATTTCCAAC C AC AA 
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Microarrav spotting of PCR Products 
[0227] PCR products were purified using Promega Wizard PCR purification kits 

and protocols. The PCR products were then diluted and mixed with DMSOto a final 

concentration of 50% DMSO. As shown in Figure 11, rtPCR reactions were performed 

independently using multiplexes 1 and 2 on an RNA sample. The two PCR reactions were 

purified and then mixed together in different ratios ranging from 99:1 to 1:99 wherein the 

total amount of PCR product had a final concentration of 44 ng//d. The PCR/DMSO mix 

was then spotted onto aminosilane coated slides (Sigma) in 12 replicates and baked at 85°C 

for 1 hour to immobilize the DNA. The spotted slides were prehybridized with 5X SSC 

buffer containing 0.1% SDS and 1% BSA at 42°C for 45 minutes. The slide was then 

washed twice with water and once with isopropanol then dried. 

Probe Hybridization 

[0228] Fluorescently labeled oligonucleotide probe, e.g. end labeled with Cy3 or 

Cy5, was prepared at a concentration of 1 /xM in IX hybridization buffer (4X SSC, 0.02% 
Tween20, 1 Unit/ml poly dA, and 1 jug//zl yeast tRNA). In the example illustrated in Figure 
11, an oligonucleotide probe for the gene RFC4 (Cy5) present only in multiplex 2 was 
incubated at 95°C for 3 minutes and 4°C for 30 seconds. 35 /xl of probe was added to the 
prepared microarray slides, covered with a microscope glass coverslip and incubated in a 
humidified chamber at 42°C for 1 hour. Following hybridization, the coverslip was 
removed and the slides were washed first with a low stringency buffer containing IX SSC 
and 0.2% SDS at 42°C, then twice with a high stringency buffer containing 0.1X SSC and 
0.2% SDS at 22°C, and finally twice with 0.1X SSC at 22°C. The slides were then dried 
and scanned. 

Slide Scanning 

[0229] Scanning was performed using an Axon Instruments GenePix microarray 

scanner using the standard protocols recommended by the manufacturer. Data was then 
imported into Axon Acuity software for analysis. As shown in Figure 1 1, the amount of 
fluorescence signal increases as the quantity of multiplex 2 in the sample increases from 0 
to 44 ng//xl. 

[0230] While the foregoing invention has been described in some detail for purposes 

of clarity and understanding, it will be clear to one skilled in the art from a reading of this 
disclosure that various changes in form and detail can be made without departing from the 
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true scope of the invention. For example, all the techniques and apparatus described above 
can be used in various combinations. All publications, patents, patent applications, and/or 
other documents cited in this application are incorporated by reference in their entirety for 
all purposes to the same extent as if each individual publication, patent, patent application, 
and/or other document were individually indicated to be incorporated by reference for all 
purposes. 
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